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Quantization 
This module introduces the topic of quantization and prefaces the 
discussion of later modules in the course on source coding. 


e Given a continuous-time and continuous-amplitude signal x(t), 
processing and storage by modern digital hardware requires 
discretization in both time and amplitude, as accomplished by an 
analog-to-digital converter (ADC). 

e We will typically work with discrete-time quantities x(n) (indexed by 
“time” variable n) which we assume were sampled ideally and without 
aliasing. 

e Here we discuss various ways of discretizing the amplitude of x(n) so 
that it may be represented by a finite set of numbers. This is generally 
a lossy operation, and so we analyze the performance of various 
quantization schemes and design quantizers that are optimal under 
certain assumptions. A good reference for much of this material is the 
textbook by Jayant and Noll. 


Memoryless Scalar Quantization 

Memoryless scalar quantization is discussed, with a focus on the uniform quantizer. Uniform quantizer 
error variance is derived under the assumption of many quantization levels, and several examples are 
provided. 


e Memoryless scalar quantization of continuous-amplitude variable x is the mapping of x to output y, 
when x lies within interval 
Equation: 


2 c=1a;,< eo te}, WH 1, 2,360, 


The x; are called decision thresholds, and the number of quantization levels is L. The quantization 
operation is written y = Q(z). 

¢ When 0 € {y1,---, yz}, quantizer is called midtread, else midrise. 

* Quantization error defined gq := x — Q(z) 


(a) (b) 


rl Q(z) 


(a) Uniform and (b) non-uniform quantization Q(x) 
and quantization error q(x) 


¢ If xis ar.v. with pdf p, (-) and likewise for q, then quantization error variance is 
Equation: 


o, =E {q°} / q’ Pq (a)dq 


= [ @-Q(e))*r. (ede 
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e A special quantizer is the uniform quantizer: 
Equation: 


Yrr1— ye = A, for k=1,2,---,L-1, 
Be1—a,e = A, for finite xy, e441, 


—@1 = LE41 = OW. 


quantization with v2 = —max + A and xy = Lmax — A, and with y; = x2 — A/2 and 

Yk = £p + A/2 (for k > 1), the quantization error is well approximated by a uniform distribution 
for large L: 

Equation: 


a= a lal < A/2, 


~ (0 else. 
Why? 


© As L-+00, pz (x) is constant over X; for any k. Since g = 2 — yx|¢9;, it follows that 
Pq(q\@ € 2%) will have uniform distribution for any k. 

o With  € (—Xmax, Lmax) and with x, and yx as specified, g € (—A/2, A/2] for all x (see 
[link]). Hence, for any k, 


Equation: 
1/A qe (-A/2, 4/2), 
Pq (ale S 2x) ~ ‘ else. 
A q 
A/2 
ala eda a ae 
—~Zmax —A/2 ai —~Zmax 


Quantization error for bounded input and midpoint 
Yk 


In this case, from [link] (upper equation), 
Equation: 


ee 1 [3]? 1/4 A3 
a+ [eke SET M8) 
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If we use R bits to represent each discrete output y and choose L = 2%, then 
Equation: 


y) 2 
~ oe A ae 1 ( 22max = 12 972k 
: 12 12 L 3 


and 
Equation: 


o ae 2R x 
SNR [dB] = 10 logy) (22 } = 10 logy) (3-22-2?*) = | 6.02R— 10 logig (3 na ) 

OF Umax = 
Recall that the expression above is only valid for 0,7 small enough to ensure x € (—2max; Zmax)- 
For larger 0,7, the quantizer overloads and the SNR decreases rapidly. 


Example: 
SNR for Uniform Quantization of Uniformly-Distributed Input 
For uniformly distributed x, can show 2max/Oz = /3, so that SNR = 6. 02R. 


Example: 

SNR for Uniform Quantization of Sinusoidal Input) 

For a sinusoidal x, can show fmax/Or = /2, so that SNR = 6.02R + 1. 76. (Interesting since 
sine waves are often used as test signals). 


Example: 

SNR for Uniform Quantization of Gaussian Input 

Though not truly bounded, Gaussian x might be considered as approximately bounded if we 
choose 2max = 40, and ignore residual clipping. In this case SNR = 6.02R — 7. 27. 


MSE-Optimal Memoryless Scalar Quantization 

The mean-squared error minimizing scalar quantizer (the Lloyd-Max quantizer) is derived here using 
Lagrange optimization. Background on Lagrange optimization is also provided. Finally, error variance is 
derived for the asymptotic case of many quantization levels. 


e Though uniform quantization is convenient for implementation and analysis, non-uniform quantization 
yeilds lower 0,” when p; (@) is non-uniformly distributed. By decreasing |q(z)| for frequently 
occuring x (at the expense of increasing |q(x)| for infrequently occuring x), the average error power 
can be reduced. 

¢ Lloyd-Max Quantizer: MSE-optimal thresholds {x,} and outputs {y;, } can be determined given an 
input distribution p, (e), and the result is the Lloyd-Max quantizer. Necessary conditions on {2,} and 


{yx} are 

Equation: 
oa; 0 f k 2 L d ae 0 f k 1 L 
Or, ssl e {2,7%%, } an Oyun or E41, ie 


Using equation 2 from Memoryless Scalar Quantization (third equation), 0/0b ie jf @)de= FO), 
0/da ri f (z)dx = —f (a), and above, 


Equation: 
Oc? ox — WA pe lg...L 
a iy tei) e) — Gt re) = 0 SF ee t h 
Lk Ly = —0O, 274, = W, 
Oc2 Lk+1 pe xp,(x)dz 
qd a 

—— =2f (x — yx) Pz (x)dx = 0 yy, = +-——__, ke {1.--- L} 
OYk Lk ae pe(e)de 


It can be shown that above are sufficient for global MMSE when 0? log pz (x) /Ox? < 0, which holds 
for uniform, Gaussian, and Laplace pdfs, but not Gamma. Note: 


© optimum decision thresholds are halfway between neighboring output values, 
© optimum output values are centroids of the pdf within the appropriate interval, i.e., are given by 
the conditional means 


Equation: 
2(v,2€ 2 fie rp, (x)dx 
yy, =E {2|x €¢ Bye} = [+ (zlz € 2 )de = [= : dx, = - : 
Pr (2 € 2;) le Dz (x)dx 


Iterative Procedure to Find {x*} and {y;}: 


1. Choose 1. 

2. Fork =1,---,£-1, 
given % and x, solve [link] (lower equation) for 2x11, 
given 9, and x41, solve [link] (upper equation) for $,41.end; 

3. Compare #jr, to y;, calculated from [link] (lower equation) based on &, and x1,41 = oo. Adjust 4 
accordingly, and go to step 1. 


performance for large L. Here, we assume that 


© the pdf p, (x) is constant over x € 2%}, fork € {1,---, LD}, 
o the pdf pz (x) is symmetric about x = 0, 
o the input is bounded, i.e., 2 € (—%max;&max) for some (potentially large) Xmax- 


So with assumption 
Equation: 


P(e) =pe(yz,) for x,y, € 2 


and definition 
Equation: 


Ay t= Bei — @k, 


we can write 
Equation: 


Py i= Pr{ee Zi} = polyp) Op (where we require ) > Py = 1) 


and thus, from equation 2 from Memoryless Scalar Quantization(lower equation), 6,7 becomes 
Equation: 


For MSE-optimal {y;,}, know 
Equation: 


002 P, Thi 
0= 2 = 2-4 i] (c—yp)da => | ye = eeu 
zx 


which is expected since the centroid of a flat pdf over X; is simply the midpoint of X;. Plugging yj, into 
[link], 


Equation: 
L 
P, 3 Lk+1 
oy = x—2,/2—2 2 
t= 2 gay [28/2 — aess/2)"| 
+. P, 3 3 
= SS | (@ni1/2 — 24/2) — (xp /2 — e41/2) 
k=1 
L 3 L 
= 2 = — P, Ay. 


Note that for uniform quantization (A, = A), the expression above reduces to the one derived earlier. 
Now we minimize 0,7? w.r.t. {Ax}. The trick here is to define 
Equation: 


1g ly 
an := ¢/pr (yt) A, so that o? = Ty De Pe (viAL a ae 
k= 


For p, (x) constant over X;,and yz, € 2%; 
Equation: 


m= vhs oe 


we have the following constrained optimization problem: 
Equation: 


=f “ele Dz ( = C, (a known constant), 
—Zm, 


»» 


yp= ee ai bd 


This may be solved using Lagrange multipliers. 


Note: 

Optimization via Lagrange Multipliers 

Consider the problem OF minimizing N-dimensional real-valued cost function J(x), where 

x = (a1,%,---, xy)’, subject to M <N real-valued equality constraints fm (x) = am, 

m =1,---,M. This may be converted into an unconstrained optimization of dimension N-+M by 
introducing additional variables X = (Aq, ---, rey known as Lagrange multipliers. The 
uncontrained cost function is 

Equation: 


UC) = Je (x) + Am (a) arn) 


and necessary conditions for its minimization are 
Equation: 


a 
aq Ju(%A)=0 © Sax ) + ange =0 


3) 


py Ju A) = 0 —- iene ore for fi, = ileas AWE. 


The typical procedure used to solve for optimal x is the following: 
1. Equations for x,,n = 1,---, N, in terms of {Am} are obtained from [link] (upper equation). 


2. These N equations are used in [link] (lower equation) to solve for the M optimal A,,. 
3. The optimal {X,,} are plugged back into the N equations for x, yielding optimal {z,, }. 


Necessary conditions are 
Equation: 


VE, #,(Det+a(Sar-c.}) =0 > \=-307 = y= -n/3 
3) 


which can be combined to solve for A: 


Equation: 
L 2 
fod C. 
; = C4 =-3(—"]}. 
3 X 3( L ) 


k=1 


Plugging A back into the expression for ae, we find 
Equation: 


ay = Caf hy VE 


Using the definition of az, the optimal decision spacing is 


Equation: 
fe SO 522) ROE 
L4/pz (yx) ee 


and the minimum quantization error variance is 
Equation: 


(Sm, x/p=(@)dz) 


2 : _ 1 * 3 1 


k 
3 
cai (S2", «/pe (wae) | 


An interesting observation is that a7 


3, the £*” interval's optimal contribution to et: is constant over €. 


Entropy Coding 
In this module, the concepts of entropy and variable-length coding are introduced, 
motivating the description of the Huffman encoder. 


¢ Binary Scalar Encoding: Previously we have focused on the memoryless scalar 
quantizer y = Q(x), where y takes a value from a set of L reconstruction levels. By 
coding each quantizer output in binary format, we transmit (store) the information at a 
rate (cost) of 
Equation: 


R= /log, L| bits/sample. 


If, for example, L = 8, then we transmit at 3 bits/sample. Say we can tolerate a bit 
more quantization error, e.g., as results from L = 5. We hope that this reduction in 
fidelity reduces our transmission requirements, but with this simple binary encoding 
scheme we still require R = 3 bits/sample! 

e Idea—Block Coding: Let's assign a symbol to each block of 3 consecutive quantizer 
outputs. We need a symbol alphabet of size > 5° = 125, which is adequately 
represented by a 7-bit word (2’ = 128). Transmitting these words requires only 
7/3 = 2.33 bits/sample! 

e Idea—Variable Length Coding: Assume some of the quantizer outputs occur more 
frequently than others. Could we come up with an alphabet consisting of short words 
for representing frequent outputs and longer words for infrequent outputs that would 
have a lower average transmission rate? 


Example: 

Variable Length Coding) 

Consider the quantizer with L = 4 and output probabilities indicated in [link]. 
Straightforward 2-bit encoding requires average bit rate of 2 bits/sample, while the 
variable length code in [link] gives average 

R= >>, Peng = 0.6-140.25-2+0.1-3 + 0.05-3 = 1.55 bits/sample. 


output P, code 
y1 0.60 0 


4 0.05 AE 


Exercise: 


Problem: 


Given an arbitrarily complex coding scheme, what is the minimum bits/sample 
required to transmit (store) the sequence {y(n) }? 


Solution: 


When random process {y(7) } is i.i.d., the minimum average bit rate is 
Equation: 


Ron = yA, 


where Hy is the entropy of random variable y(n) in bits: 
Equation: 


Hy = — 


L 
Py logs Py, 


k=1 


and € is an arbitrarily small positive constant (see textbooks by Berger and by 
Cover & Thomas). 


Notes: 


o Entropy obeys 0 < Hy, <log, L. The left inequality occurs when P; = 1 for 
some k, while the right inequality occurs when P;, = 1/Z for every k. 

o The term entropy refers to the average information of a single random variable, 
while the term entropy rate refers to a sequence of random variables, i.e., a 
random process. 

o When {y(7)} is not independent (the focus of later sections), a different 
expression for Rin applies. 

o Though the minimum rate is well specified, the construction of a coding scheme 
which always achieves this rate is not. 


Example: 

Entropy of Variable Length Code 

Recalling the setup of [link], we find that 

H, = —(0.6 log, 0.6 + 0. 25 log, 0.25 + 0.1 log, 0.1 + 0.05 log, 0.05) = 1. 49 
bits. Assuming iid. {y(n)}, we have Rimin = 1. 49 bits/sample. Compare to the 
variable length code on the right which gave R = 1. 55 bits/sample. 


output Pe code 
Ja 0.60 0 

y2 Or25 01 
V3 0.10 011 
Ya 0.05 111 


¢ Huffman Encoding: Given quantizer outputs y, or fixed-length blocks of outputs 
(Y; Y ye), the Huffman procedure constructs variable length codes that are optimal in 
certain respects (see Cover & Thomas). For example, when the probabilities of { P, } 
are powers of 1/2 (and {y(7n)} is i.id.), the entropy rate of a Huffman encoded output 
attains Ry in- 


Note: 
Huffman Procedure (Binary Case) 


1. Arrange ouput probabilities P;, in decreasing order and consider them as leaf 
nodes of a tree. 
2. While there exists more than one node: 


« Merge the two nodes with smallest probability to form a new node whose 
probability equals the sum of the two merged nodes. 
» Arbitrarily assign 1 and 0 to the two branches of the merging pair. 


3. The code assigned to each output is obtained by reading the branch bits 
sequentially from root note to leaf node. 


Example: 

Huffman Encoder Attaining Ryn 

In [link], a Huffman code was constructed for the output probabilities listed below. 
Here H, = —(0.5 log, 0.5 + 0. 25 log, 0.25 + 2- 0.125 log, 0.125) = 1.75 
bits, so that Rmin = 1.75 bits/sample (with the i.i.d. assumption). Since the average 
bit rate for the Huffman code is also 

R = 0.5-14+0.25-2+0.125-3+0.125-3 = 1.75 bits/sample, Huffman 
encoding attains R,»jn for this output distribution. 


output Pie code 
J1 0.5 0 

Yo 0.25 01 
y3 0.125 011 


yA 0.125 111 


Quantizer Design for Entropy Coded Sytems 

Motivated by the cascade of memoryless quantization and entropy coding, the entropy-minimizing scalar 
memoryless quantizer is derived. Using a compander formulation and tools from the calculus of variations, it is 
shown that the entropy-minimizing quantizer is the simple uniform quantizer. The penalty associated with 
memoryless quantization is then analyzed in the asymptotic case of many quantization levels. 


e Say that we are designing a system with a memoryless quantizer followed by an entropy coder, and our goal 
is to minimize the average transmission rate for a given Og? (or vice versa). Is it optimal to cascade a O,°- 
minimizing (Lloyd-Max) quantizer with a rate-minimizing code? In other words, what is the optimal 
memoryless quantizer if the quantized outputs are to be entropy coded? 

e A Compander Formulation: To determine the optimal quantizer, 


1. consider a companding system: a memoryless nonlinearity c(z) followed by uniform quantizer, 
2. find c(x) minimizing entropy Hy for a fixed error variance o,7. 


c(x) 


Compander curve: nonuniform input regions mapped to 
uniform output regions (for subsequent uniform quantization) 


e First we must express Og and Hy in terms of c(z). [link] suggests that, for large L, the slope 
c’ (x) := de(x)/dzx obeys 


Equation: 
j — 22max /L 
c (2) |ne x, _ A; ? 
so that we may write 
Equation: 
22 max 


gas .. 
Le (z) rE Ry, 


(lower equation) can be transformed as follows. 
Equation: 


— A; sinceP; = pz (x) A; for large L 
LER, LER, 
2 Z max 


ama €!(x)” 


Similarly, 
Equation: 


a 
] 


L 
—~S°P, log, Pi 
k= 


L 
= —) pz (x)As log, (pe (2) Ax) 
k= 


re ®, 
L L 
= —So pz (x)Ar log, p2(z) = — Spr (x) Ap logy Ax 
k= a re %, 
Emax Emax on 
max 
= -{ Px (2) logy pz (x)dx — / Pz (a) log, i dz 
—Zmax max Lc (z) 
hz: * differential entropy” Ay 
nx Zmax Zmax 
= h,- log, asf Dx (x)dx +f Dx (x) log, c’ (x)dzx 
=1 
max 
= constant + / Px (x) logy c' (x)dx 
—£ max 


fixed error variance 0,°. We employ a Lagrange technique again, minimizing the cost 


ae Px (x) log, c’ (a)dzx under the constraint that the quantity Best Dz (x)(c' (x)) da equals a constant 
C. This yields the unconstrained cost function 
Equation: 


2 


Ji (c! (x), d) = f Ibe (x) log, ce’ (a) + A(Pe (x) fe (x)) 


g(c'(x),A) 


= C) dz, 


with scalar A, and the unconstrained optimization problem becomes 
Equation: 


in Jy (c' (x), A). 
dan ele) 2) 
The following technique is common in variational calculus (see, e.g., Optimal Systems Control by Sage & 
White). Say a* (a) minimizes a (scalar) cost J(a(a)). Then for any (well-behaved) variation n(a) from this 


optimal a* (a), we must have 
Equation: 


FIle" (2) +en(0)) = 0 


where € is a scalar. Applying this principle to our optimization problem, we search for c’ (x) such that 
Equation: 


6) 
Vn(z), —Jy(c' (x) +en(z),A) =0. 
Oe e=0 
From [link] we find (using log, a =log, e- log, a) 
Equation: 
OJ _ 7 Oo : 
Be cy [~ FAG (x) + en (zx), d) Pas 


= i - [Po (x) log, (e) log, (c’ (x) + en(x)) + d( 2 (x)(c' (x) + en (z)) -2 c)] 


Bmax e=0 


[> ve (@(€ (@)) [loss (6) =2a(€' @)) 7] n(2) de 


max 


and to allow for any n(x) we require 
Equation: 


2X 
log, e- 


logy (e) — 2A(c' (() =0 = ¢(z)= 


a constant! 


Applying the boundary conditions, 
Equation: 


c ( Picea) = Emax 


Ble inter > c(z) =a |. 


Thus, for large-L, the quantizer that minimizes entropy rate Hy for a given quantization error variance ag is 
the uniform quantizer. Plugging c(2) = z into [link], the rightmost integral disappears and we have 
Equation: 


22 max 
L ? 
A 


H,| = hz— logy 


uniform 


and using the large-L uniform quantizer error variance approximation equation 6 from Memoryless Scalar 
Quantization, 
Equation: 


H,| 


1 2 
uniform — hg — 29 logs (1207). 
It is interesting to compare this result to the information-theoretic minimal average rate for transmission of a 
continuous-amplitude memoryless source x of differential entropy h, at average distortion 047(see Jayant & 
Noll or Berger): 
Equation: 


e= 


iz flog, (e) pe (x)(e' («) + en(x)) n(x) — 2Ape (x) (c! (x) + €n(a)) “*n(@)| on 


1 
Rin = hz - - logs (27e Ge): 


Comparing the previous two equations, we find that (for a continous-amplitude memoryless source) uniform 
quantization prior to entropy coding requires 
Equation: 


1 
3 1082 (=) = | 0.255 bits/sample 


more than the theoretically optimum transmission scheme, regardless of the distribution of x. Thus, 0.255 
bits/sample (or ~ 1.5 dB using the 6. 02R relationship) is the price paid for memoryless quantization. 


Adaptive Quantization 

Motivated by the practical problem of non-stationary sources, adaptation of the uniform 
quantizer's stepsize is discussed. In particular, adaptive quantization based on forward estimation 
(AQF) and backward estimation (AQB) are discussed, in both block-based and recursive forms. 


e Previously have considered the case of stationary source processes, though in reality the 
source signal may be highly non-stationary. For example, the variance, pdf, and/or mean may 
vary significantly with time. 

e Here we concentrate on the problem of adapting uniform quantizer stepsize A to a signal with 
unknown variance. This is accomplished by estimating the input variance G, (7) and setting 
the quantizer stepsize appropriately: 

Equation: 


_ 262 Fx (n) 


A (n) = 


Here @, is a constant that depends on the distribution of the input signal x whose function is 
to prevent input values greater than a, (7) from being clipped by the quantizer (see [link]); 
comparing to non-adaptive step size relation A = 22%max/L, we see that p2Ox (n) ~ Lmax- 


eo 


om dbx Ox 


Adaptive quantization stepsize 
6(n) = 2y26,/L 


¢ As long as the reconstruction levels {y;, } are the same at encoder and decoder, the actual 
values chosen for quantizer design are arbitrary. Assuming integer values as in [link], the 
quantization rule becomes 
Equation: 


in) Ea — + midrise, 
y(n) = 


a(n) : 
At) | midtread. 


¢ AQF and AQB: [link] shows two structures for stepsize adaptation: (a) adaptive quantization 
with forward estimation (AQF) and (b) adaptive quantization with backward estimation 
(AQB). The advantage of AQF is that variance estimation may be accomplished more 
accurately, as it is operates directly on the source as opposed to a quantized (noisy) version 
of the source. The advantage of AQB is that the variance estimates do not need to be 
transmitted as side information for decoding. In fact, practical AQF encoders transmit 
variance estimates only occasionally, e.g., once per block. 


(a) (b) 
2(n) Q vin) ea ale a(n) | Q yoy gt”) gn (n) 
Esimetor + aia = channel === ~--' a emt eaieeer [7 
(a) AQF and (b) AQB 


¢ Block Variance Estimation: When operating on finite blocks of data, the structures in [link] 
perform variance estimation as follows: 


Equation: 
2 Loo, 
Block AQF: G7, (n) = W D x” (n — i) 
A2 1 v > “\\2 
Block AQB: Gi(n) = W > (y(n — i) - A(n —%)) 


N is termed the learning period and its choice may significantly impact quantizer SNR 
performance: choosing N too large prevents the quantizer from adapting to the local statistics 
of the input, while choosing N too small results in overly noisy AQB variance estimates and 
excessive AQF side information. [link] demonstrates these two schemes for two choices of 
N. 


(a) AQF, N=128 


4000 4200 4400 4600 4800 5000 5200 5400 5600 5800 6000 
(c) AQF, N=32 


(d) AQB, N=32 


Block AQF and AQB estimates of o (n) superimposed on 
|x(n)| for N = 128, 32. SNR acheived: (a) 22.6 dB, (b) 
28.8 dB, (c) 21.2 dB, and (d) 28.8 dB. 


e Recursive Variance Estimation: The recursive method of estimating variance is as follows 
Equation: 


Recursive AQF: G2(n) = aG?(n—1)+(1—a)a?(n—1) 


x 


Recursive AQB: 62(n) = a6? (n—1)+(1—a)(y(n—1)-A(n—1))’. 


where a is a forgetting factor in the range 0 < a@ < 1 and typically near to 1. This leads to an 
exponential data window, as can be seen below. Plugging the expression for ¢? (n — 1) into 
that for G? (n), 

Equation: 


e2(n) = a(ae? (n— 2) + (1—-a)x?(n—2)) + (1-a)2z? (n—-1) 


62 (n — 2) + (1—a)(z? (n— 1) + 0x? (n — 2)). 


Then plugging G2 (n — 2) into the above, 

Equation: 

a2 (n) = a (ae? (n— 3) +(1—a)z?(n— 3)) + (1 —a)(2? (n— 1) + a2? (n — 2)) 
oF? (n — 3) + (1 —a)(2? (n — 1) + ax? (n — 2) + a? 2? (n — 3)). 


Continuing this process N times, we arrive at 
Equation: 


N 
62 (n) = (l—a)) at 12? (n—i) + 0%G (n—N). 
i=1 


Taking the limit as N — oo, a < 1 ensures that 
Equation: 


(a) AQF, lambda=0.9 


(b) AQB, lambda=0.9 


4000 4200 4400 4600 4800 5000 5200 5400 5600 5800 6000 


4000 4200 4400 4600 4800 S000 5200 5400 5600 5800 6000 


Exponential AQF and AQB estimates of o, (n) 
superimposed on |x(n)| for A = 0. 9,0. 99. (a) 20.5 dB, 
(b) 28.0 dB, (c) 22.2 dB, (d) 24.1 dB. 


Pulse Code Modulation 

This module prefaces the subsequent modules on differential pulse code 
modulation (DPCM). In this module, the standard technique known as pulse 
code modulation (PCM) is described. 


e PCM is the “standard” sampling method taught in introductory DSP 
courses. The input signal is 


1. filtered to prevent aliasing, 
2. discretized in time by a sampler, and 
3. quantized in amplitude by a quantizer 


before transmission (or storage). Finally, the received samples are 
interpolated in time and amplitude to reconstruct an approximation of 
the input signal. Note that transmission may employ the use of 
additional encoding and decoding, as with entropy codes. PCM is the 
most widespread and well-understood digital coding system for 
reasons of simplicity, though not efficiency (as we shall see). 


anti-alias palais ‘ nhs | (transmit/| ee : 
filter quantize }———> encode | sore | decode |— interp_ |-—> 
me a(n) y(n) i... Since Se ee ey. ' 9(n) &(t) 


Standard PCM system 


Differential Pulse Code Modulation 

Differential pulse code modulation (DPCM) is described. First, quantized predictive 
encoding is motivated but then shown to suffer from amplification of quantization 
error at the decoder. This problem is avoided by DPCM, which places the quantizer 
in the prediction loop. 


e Many information signals, including audio, exhibit significant redundancy 
between successive samples. In these situations, it is advantageous to transmit 
only the difference between predicted and true versions of the signal: with a 
“good” predictions, the quantizer input will have variance less than the original 
signal, allowing a quantizer with smaller decision regions and hence higher 
SNR. (See [link] for an example of such a structure.) 

e Linear Prediction: There are various methods of prediction, but we focus on 
forward linear prediction of order N, illustrated by [link] and described by the 
following equation, where % (7) is a linear estimate of x(n) based on N 
previous versions of x(n): 

Equation: 


@(n) = Dd hit (n —i). 


It will be convenient to collect the prediction coefficients into the vector 
t 
= (his has=**5 hy) . 


«) —} } 


#(n) 


Linear Prediction 


¢ Lossless Predictive Encoding: Consider first the system in [link]. 


x(n) aD e(n) —e(n) 


A 


Lossless Predictive Data Transmission System 


The system equations are 


Equation: 

e(n) =2x(n)—Z(n 

y(n) =e(n) + y(n 
In the z-domain (i.e., X (z) = So. g(njz ” end A (2) = S- hiz *), 
Equation: 


E(z) = X(z)-X(z) =X (2)(1- H(z) 
Y(z) = E(z)+¥(z2) =E(z)+H(2Y (2) 


We call this transmission system lossless because, from above, 
Equation: 


¥ (@)= ae = xX 


Without quantization, however, the prediction error e(n) takes on a continuous 
range of values, and so this scheme is not applicable to digital transmission. 


¢ Quantized Predictive Encoding: Quantizing the prediction error in [link], we 
get the system of [link]. 
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Quantized Predictive Coding System 


Here the equations are 
Equation: 


q(n) = €(n)—e(n) 
én). = en) =2(n) 
y(n) = &€(n) + y(n) 


In the z-domain we find that 
Equation: 


E(z) = X(2)(1- H(z)) + Q(z) 
E(z) 


Q(z) 
1— H(z) 


Thus the reconstructed output is corrupted by a filtered version of the 
quantization error where the filter (1 — H(z)) * is expected to amplify the 
quantization error; recall that Y (z) = E(z)(1 — H(z))~' where the goal of 
prediction was to make a2 < o?. This problem results from the fact that the 
quantization noise appears at the decoder's predictor input but not at the 
encoder's predictor input. But we can avoid this... 

e DPCM: Including quantization in the encoder's prediction loop, we obtain the 
system in [link], known as differential pulse code modulation . 
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A Typical Differential PCM System 


System equations are 


Equation: 
q(n) = &(n)—e(n) 
e(n) = a(n) — &(n) 
Z(n) = &(n)+2#(n) 
y(n) = €(n)+9(n) 


In the z-domain we find that 
Equation: 


(2) = (Q(z) + E(z)) + (X (2) — E(z)) =X (2) + Q(z) 
E (2) 
Y@) = Tra@ 
so that 
Equation: 


y() =< FA—HOHO+O0 +20 _ iy .9(9 = R(0) 
— H(z) 
Thus, the reconstructed output is corrupted only by the quantization error. 
Another significant advantage to placing the quantizer inside the prediction 
loop is realized if the predictor made self-adaptive (in the same spirit as the 
adaptive quantizers we studied). As illustrated in [link], adaptation of the 
prediction coefficients can take place simulateously at the encoder and decoder 
with no transmission of side-information (e.g. h(n))! This is a consequence of 
the fact that both algorithms have access to identical signals. 
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Adaptive DPCM System 


Performance of DPCM 

Here we characterize the performance of DPCM via the simpler surrogate known as 
"quantized predictive encoding", which is known to have very similar performance in practice. 
To do this, we derive the optimum prediction coefficients, the resulting prediction error 
variance and gain over PCM. 


¢ As we noted earlier, the DPCM performance gain is a consequence of variance reduction 
obtained through prediction. Here we derive the optimal predictor coefficients, prediction 
error variance, and bit rate for the system in figure 4 from Differential Pulse Code 
Modulation. This system is easier to analyze than DPCM systems with quantizer in loop 
(e.g., figure 5 from Differential Pulse Code Modulation) and it is said that the difference 
in prediction-error behavior is negligible when R > 2(see page 267 of Jayant & Noll). 

* Optimal Prediction Coefficients: First we find coefficients h minimizing prediction error 
variance: 
Equation: 


minE fe? (n)}. 


Throughout, we assume that 2(n) is a zero-mean stationary random process with 
autocorrelation 
Equation: 


r, (k) :=E {a (n)z (n — k)} = 1, (—k). 


A necessary condition for optimality is the following: 
Equation: 


Vje{l,---,N}, 0 = ~2~—E {e?(n)} 


= E {e(n)z(n — j)} < The” OrthogonalityPrinciple” 
= Bf (=m -Yomair-d}amr—a)} 
= E {x(n)a(n— j)} — hi E {a(n —i)a(n — j)} 


= #5 (3) — do hirs (j -3) 


where we have used equation 1 from Differential Pulse Code Modulation. We can rewrite 
this as a system of linear equations: 
Equation: 


1) rz (0) i; (1) “++ Tz (N-1) hy 

Te (2) rz (1) re (0) “++ Ty (N—2) ho 
Tx (N) Tx (N-1) Tr, (N—2) Pe (0) hn 

——— ———— SS Ss 


Tx Ry h 


which yields an expression for the optimal prediction coefficients: 
Equation: 


hy = Ray fx: 


¢ Error for Length-N Predictor: The definition x (n) := (x(n), a(n—1),---,a(n—N))’ 
and [link] can be used to show that the minimum prediction error variance is 
Equation: 


o = E fe? (n) } 


= I (sit 


= (1 —h{) E {x(n)x' (n)} (...) 


=a wy) (" &)C4) 


= r,(0)—2hir, + hiRyh, 


Tz (0) — r}Ry Te. 


Error for Infinite-Length Predictor: We now characterize o2 | min.v a8 NV — oo. Note that 


aia 


e—. 
Ryu 


Equation: 


Using Cramer's rule, 
Equation: 


1= ° Ry 3 [Ry 2 — [Rwal 
IR “| nin, |Buy | ; IRv| 
min,N 
Note: 
Cramer's Rule 
Given matrix equation Ay = b, where A = (aj, a2,---, ay) € RX*%, 
Equation: 
Ye = |ay,- ae »Ap_1, b, Ags: +, ay 
|A] 


where |-| denotes determinant. 


A result from the theory of Toeplitz determinants (see Jayant & Noll) gives the final 
answer: 
Equation: 


_ | Rvi| 7 a 
Felmin= lim TEx) =| exp (az J, mn Sz (e™) dw) 


where S', (e%) is the power spectral density of the WSS random process x(n): 
Equation: 


(Note that, because r, (7) is conjugate symmetric for stationary x(n), Sz (ec) will 
always be non-negative and real.) 

ARMA Source Model: If the random process x(n) can be modelled as a general linear 
process , i.e., white noise v(m) driving a causal LTI system B(z): 

Equation: 


x(n) =u(n) + S° bgv(n — k) with S> |bxl? < OO; 
k=1 k 


then it can be shown that 
Equation: 


2? =f. jw ) 
O°, = exp (5 [ Saee )dw ; 


Thus the MSE-optimal prediction error variance equals that of the driving noise u(n) 
when N = oo. 

e Prediction Error Whiteness: We can also demonstrate that the MSE-optimal prediction 
error is white when N = oo. This is a simple fact of the orthogonality principle seen 
earlier: 

Equation: 


0 =Ef{e(n)a(n—B)}, k= 1,2,-~ 


The prediction error has autocorrelation 
Equation: 


E {e(n)e(n—k)} = E fe (: (n — k) + hie (n — -o)} 


E {e(n)a(n —k)} + 3 hE {e(n)a(n — k — i)} 
0fork>0 os 30 


= 0 |mind (k). 


e AR Source Model: When the input can be modelled as an autoregressive (AR) process of 
order N: 
Equation: 


1 
X(z) = a __ V (2), 
( ) Ltajyz-!+agqz727 +---+tanz7% ( ) 


then MSE-optimal results (i.e., oe = o lev and whitening) may be obtained with a 


forward predictor of order N. Specifically, the prediction coefficients h; can be chosen as 
h; = a; and so the prediction error F(z) becomes 
Equation: 


1tayz-!+aqz-2 +---+anz% 
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V(z) = V2), 


¢ Efficiency Gain over PCM: Prediction reduces the variance at the quantizer input without 
changing the variance of the reconstructed signal. 


o By keeping the number of quantization levels fixed, could reduce quantization step 
width and obtain lower quantization error than PCM at the same bit rate. 

o By keeping the decision levels fixed, could reduce the number of quantization levels 
and obtain a lower bit rate than PCM at the same quantization error level. 


Assuming that (7) and e(7) are distributed similarly, use of the same style of quantizer 
on DPCM vs. PCM systems yields 
Equation: 


2 
Oo 
SNRppcm = SNRpcm + 10 logo = 


e€ 


Analysis of DPCM using Rate-Distortion Theory 

Using rate-distortion theory, the optimal SNR attainable for rate-R source 
coding is related to R and the spectral flatness measure of the source. The 
SNR of rate-R DPCM is then analyzed and compared to the optimal, and 
shown to suffer by only 1.53 dB. 


e The rate-distortion functionR(D) specifies the minimum average rate 
R required to transmit the source process at a mean distortion of D, 
while the distortion-rate functionD(R) specifies the minimum mean 
distortion D resulting from transmission of the source at average rate 
R. These bounds are theoretical in the sense that coding techniques 
which attain these minimum rates or distortions are in general 
unknown and thought to be infinitely complex as well as require 
infinite memory. Still, these bounds form a reference against which 
any specific coding system can be compared. For a continuous- 
amplitude white (i.e., “memoryless”) Gaussian source x(n) (see 
Berger and Jayant & Noll), 

Equation: 


The sources we are interested in, however, are non-white. It turns out 
that when distortion D is “small,” non-white Gaussian x(n) have the 
following distortion-rate function: (see page 644 of Jayant & Noll) 


Equation: 
27% exp a a In S (ec) dw 
Ga fee . 


n> exp (+ f". mS, (e3”) dw) 
= H J" Se (e#) dw 


D(R) 


— ed 


spectral flatness measure 


Note the ratio of geometric to arithmetic PSD means, called the 
spectral flatness measure. Thus optimal coding of x(n) yields 
Equation: 


SNR(R) = 10 log, (5) 


Q 


To summarize, [link] (lower equation) gives the best possible SNR for 
any arbitrarily-complex coding system that transmits/stores 
information at an average rate of R bits/sample. 

Let's compare the SNR-versus-rate performance acheivable by DPCM 
to the optimal given by [link] (lower equation). The structure we 
consider is shown in [link], where quantized DPCM outputs é (n) are 
coded into binary bits using an entropy coder. Assuming that é (7) is 
white (which is a good assumption for well-designed predictors), 
optimal entropy coding/decoding is able to transmit and recover é (n) 
at R = H; bits/sample without any distortion. H¢ is the entropy of 

é (n), for which we derived the following expression assuming large-L 
uniform quantizer: 

Equation: 


Hz; = he- ; log, (12var(e (n) — E(n))). 


Since var (e (n) — €(n)) = o2 in DPCM, H; can be rewritten: 
Equation: 


He = he ; logs (1207). 


If e(n) is Gaussian, it can be shown that the differential entropy h, 
takes on the value 
Equation: 


ie. = 5 logs (27e02), 


so that 
Equation: 


WK il lo mea? 
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Using R = Hz and rearranging the previous expression, we find 
Equation: 


With the optimal infinite length predictor, oa equals o | min given by 
equation 10 from Performance of DPCM. Plugging equation 10 from 
Performance of DPCM into the previous expression and writing the 
result in terms of the spectral flatness measure, 


Equation: 
og = 2 Ro SFM,. 
Translating into SNR, we obtain 
Equation: 
oa 
SNR = 10 logio a 


Q 


6.02R — 1.53 — 10 log,, SFM; | [dB]. 


To summarize, a DPCM system using a MSE-optimal infinite-length 
predictor and optimal entropy coding of é (n) could operate at an 
average of R bits/sample with the SNR in [link] (lower equation). 
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Entropy-Encoded DPCM System. 


¢ Comparing [link] (lower equation) and [link] (lower equation), we see 
that DPCM incurs a 1.5 dB penalty in SNR when compared to the 
optimal. From our previous discussion on optimal quantization, we 
recognize that this 1.5 dB penalty comes from the fact that the 
quantizer in the DPCM system is memoryless. (Note that the DPCM 
quantizer must be memoryless since the predictor input must not be 
delayed.) 

e Though we have identified a 1.5 dB DPCM penalty with respect to 
optimal, a key point to keep in mind is that the design of near-optimal 
coders for non-white signals is extremely difficult. When the signal 
Statistics are rapidly changing, such a design task becomes nearly 
impossible. Though still non-trivial to design, near-optimal entropy 
coders for white signals exist and are widely used in practice. Thus, 
DPCM can be thought of as a way of pre-processing a colored signal 
that makes near-optimal coding possible. From this viewpoint, 1.5 dB 
might not be considered a high price to pay. 


Transform Coding: Background and Motivation 
Transform coding is described and an analysis is performed for the simple 2-dimensional case, including a 
comparison to PCM. 


e In transform coding (TC), blocks of N input samples are transformed to N transform coefficients which 
are then quantized and transmitted. At the decoder, an inverse transform is applied to the quantized 
coefficients, yielding a reconstruction of the original waveform. By designing individual quantizers in 
accordance with the statistics of their inputs, it is possible to allocate bits in a more optimal manner, e.g., 
encoding the “more important” coefficients at a higher bit rate. 
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N x N Transform Coder/Decoder with Scalar Quantization 


¢ Orthogonal Transforms: From our perspective, an NV x N “transform” will be any real-valued linear 
operation taking N input samples to N output samples, or transform coefficients. This operation can 
always be written in matrix form 
Equation: 


y(m) =Tx(m), T¢R*% 


where x(m) and y(m) are vectors representing N x 1 blocks of input/output elements: 


Equation: 
x(m) = (x(mN), z(mN-1), ---, a(mN—N-+1))' 
y(m) = (y(mN), y(mN-1), ---, y(mN-N+1)). 
Intuition comes from considering the transform's basis vectors{t;,} defined by the rows of the matrix 
Equation: 
= fs --- 
r= se 
= oes = 


since the coefficient yz, = tix can be thought of as the result of a “comparison” between the k*” basis 
vector and the input x. These comparisons are defined by the inner product <t,,x>= tix which has a 
geometrical interpretation involving the angle 6; between vectors t;, and x. 

Equation: 


<ty,x> = cos (9x) |] te |l2 |] x llo- 


When the vectors {t;,} are mutually orthogonal, i.e., tit 2 =O fork F £, the transform coefficients 
represent separate, unrelated features of the input. This property is convenient if the transform 


coefficients are independently quantized, as is typical in TC schemes. 


Example: 

2x2 Transform Coder 

Say that stationary zero-mean Gaussian source z(m) has autocorrelation r, (0) = 1, rz (1) = p, and 
r, (k) = 0 for k > 1. For a bit rate of R bits per sample, uniformly-quantized PCM implies a mean- 
squared reconstruction error of 

Equation: 


Ze jee 
o = — a SE Ma iteetca fo EE 
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A ae AL 
PCM ie Te 
oe 


For transform coding, say we choose linear transform 


Equation: 
() = va 4) 
T — t = SS 
ti By Od 


Setting x (m) = (a(2m) «(2m —1))‘ and y(m) = Tx(m), we find that the transformed coefficients 
have variance 


Equation: 

2 t 2 1 2 1 

02, = E{\tix(m) 7} = 5B {le (2m) +2(2m—1P} = (2r.(0)+2re(1)) = 1+0 
Equation: 


es, =E {|t}x(m) 7} = 5 E {|x (2m) — 2 (2m —1)P} = 5 (2r2 (0) ~ 2r2(1)) eae ee 


and using uniformly-quantized PCM on each coefficient we get mean-squared reconstruction errors 
Equation: 


Cie Eevee 
Equation: 


7 a eae —2R 
a, = G— pee ™ 
We use the same quantizer performance factor y, as before since linear operations preserve Gaussianity. 
For orthogonal matrices T, i.e., T~! = T*, we can show that the mean-squared reconstruction error 0,7 
equals the mean-squared quantization error: 

Equation: 


o2 = Wy BLE (Wm — b) — 2 (Vm — 8))?} (here N = 2) 
= 5B {|| x(m) —x(m) ||"} 
= 5B { | T7¥(m) —x(m) |") 
= 5 Bf P(y(m) +a(m))—x(m) |?} 
= 5 B{| PPx(m) + Tq (m) —x(m) ||") 
= SE {| T1a(m) |)"} 
= SE af(m)(T)'T a (m) 
= +E {lam |} 
i 


Since our 2 x 2 matrix is indeed orthogonal, we have mean-squared reconstruction error 
Equation: 


1 , Z 
orlno= 5 (1+ p)y22-* + (1 — p22) 


at bit rate of Ro + Ry, bits per two samples. Comparing TC to PCM at equal bit rates (i.e. 
Ro + Ri = 2R), 


Equation: 
o? tc 1 (1+ p)y.2 ° + (1 — p)y.2 2* 

= ( p)Y C p)Y = (Lp pje + (1 = p22). 
o?|pom 2 eo 


[link] shows that (i) allocating a higher bit rate to the quantizer with stronger input signal can reduce the 
average reconstruction error relative to PCM, and (ii) the gain over PCM is higher when the input signal 
exhibits stronger correlation p. Also note that when Ro = R, = R, there is no gain over PCM—a 
verification of the fact that 7 = o7 when T is orthogonal. 


Ratio of TC to PCM mean-squared reconstruction 
errors versus bit rate Ro for two values of p. 


Optimal Bit Allocation 
For transform coding with a fixed total bit rate, the optimal (SNR-maximizing) allocation of bit rates is 
derived. 


e Motivating Question: Assuming that T is an N x N orthogonal matrix, what is the MSE-optimal bitrate 
allocation strategy assuming independent uniform quantization of the transform outputs? In other words, 
what {R,} minimize average reconstruction error for fixed average rate + >, Re? 


Say that the k*" element of the transformed output vector y(n) has variance {or te With uniform 


quantization, example 1 from Background and Motivation showed that the k** quantizer error power is 
Equation: 


2 2 5—2Rx 
oa = Vue Fyn ‘5 
where R;, is the bit rate allocated to the k*” quantizer output and where Yy, depends on the distribution of 
y,. From this point on we make the simplifying assumption that yy, is independent of k. As shown in 


example 1 from Background and Motivation, orthogonal matrices imply that the mean squared 
reconstruction error equals the mean squared quantization error, so that 


Equation: 
eo = ar = is 2 yal 
r N = qk N = Uk 
Thus we have the constrained optimization problem 
Equation: 
N-1 1 Na 
min 022? st R= — Rr. 
tad 2 p> 
Using the Lagrange technique, we first set 
Equation: 
(S022 -r» TR} =0 0 ve 
OR, \ 4 No 
Since 2-22 — (en?) 2% = e ?R:ln2 the zero derivative implies 
Equation: 
r 1 r 
0=-2n2.2°%.6? __ + R,=-=] ——.——_ ] Vé. 
7 Cue 9 82 \ ONG, In 2 
Hence 
Equation: 
1 1 r 1 » 1 2 
R=. Rie l ——.——  ] = -- 1 eee — ] 
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so that 


Equation: 


1 r 1 - 
— 2 
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Rewriting [link] and plugging in the expression above, 
Equation: 


i} d 1 : 
Re = —5 log, (sus) Fy 1082 


1 ™ 1 
= R- 3 log, (11 “, + a log, oy 
k 


o2 
= |R++4 log, | ——** sr | |. 
2 (nite) 


¢ The optimal bitrate allocation expression [link] (lower equation) is meaningful only when Ry > 0, and 
practical only for integer numbers of quantization levels 2~?”* (or practical values of Ry for a particular 
coding scheme). Practical strategies typically 


o set Ry = 0 to when [link] (lower equation) suggests that the optimal R, is negative, 
© round positive R, to practical values, and 
© iteratively re-optimize {R,} using these rules until all Rp have practical values. 


e Plugging [link] (lower equation) into [link], we find that optimal bit allocation implies 
Equation: 


N-1 1/N 
o%, = a ae (i “] V £, 
k=0 


which means that, with optimal bit allocation, each coefficient contributes equally to reconstruction error. 
(Recall a similar property of the Lloyd-Max quantizer.) 


Gain over PCM 

The total reconstruction error of transform coding with optimal bit 
allocation is compared to that of uniformly quantized PCM, and the ratio is 
found to depend on a type of spectral flatness measure. 


e With an orthogonal transform and the optimal bit allocation equation 8 
from Optimal Bit Allocation (lower equation), the total reconstruction 
error equals 
Equation: 


We can compare to uniformly quantized PCM, where 
. Since an orthogonal transform implies 
Equation: 


we have the following gain over PCM: 
Equation: 


Note that the gain is proportional to the ratio between arithmetic and 
geometric means of the transform coefficient variances. (Note 
similarities to the spectral flatness measure.) The factor 

accounts for changes in distribution which affect uniform-quantizer 
efficiency. For example, if T caused uniformly distributed x to become 
Gaussian distributed y;, would contribute a 7 dB loss in TC-to- 
PCM performance. If, on the other hand, x was Gaussian, then y, 
would also be Gaussian and 


The Optimal Orthogonal Transform 
In this module, we establish that for transform coding, the optimum orthogonal tranform is the Karhunen- 
Loeve Transform (KLT). Related properties are also discussed. 


¢ Ignoring the effect of transform choice on uniform-quantizer efficiency y,, Gain Over PCM suggests 
that TC reconstruction error can be minimized by choosing the orthogonal transform T that minimizes 
the product of coefficient variances. (Recall that orthogonal transforms preserve the arithmetic average 
of coefficient variances.) 


Note: 

Eigen-Analysis of Autocorrelation Matrices 

Say that R is the N x N autocorrelation matrix of a real-valued, wide-sense stationary, discrete time 
stochastic process. The following properties are often useful: 


1. R is symmetric and Toeplitz. (A symmetric matrix obeys R = R’, while a Toeplitz matrix has 
equal elements on all diagonals.) 

2. R is positive semidefinite or PSD. (PSD means that x‘Rx > 0 for any real-valued x.) 

3. R has an eigen-decomposition 
Equation: 


R=VAV', 


where V is an orthogonal matrix (V’V = I) whose columns are eigenvectors {v;} of R: 
Equation: 


V = (vovi-:-Vw-1), 


and A is a diagonal matrix whose elements are the eigenvalues {A;} of R: 
Equation: 


A =diag (AoA1 iia AN-1)- 


4. The eigenvectors {;} of R are real-valued and non-negative. 
. The product of the eigenvectors equals the determinant ([[}) Ax = [R|) and the sum of the 
eigenvalues equals the trace (S>j9 Ax = Yo, [R] kk) 


O1 


e The KLT: Using the outer product, 


Equation: 
Yo (m) yo (m)y1 (m) yo (m)yn—1 (m 
; yi (m)yo (m) yj (m) yi (m)yn—1 (m 
y(m)y"(m) = 
yn-1(m)yo(m) yn-1(m)yi(m) --- Yy_1 (mM) 


Using [A] z,k tO denote the k* diagonal element of a matrix A, matrix theory implies 
Equation: 


N-1 N 


Tho. = [fy omy' om}],, 


k=0 k=0 
= E{y(m)y'(m)} 
= TE {x(m)x‘(m)}T* 


BR 


= TT’. TE {x(m)x'(m)}T’-T since T’A|=|A|=|AT| for orthogonal T 
I I 


thus minimization of | [, a would occur if equality could be established above. Say that the eigen- 


decomposition of the autocorrelation matrix of x(n), which we now denote by R,, is 
Equation: 


R, = V,4,V_ 


for orthogonal eigenvector matrix Vx and diagonal eigenvalue matrix Ay. Then choosing | T = Vé 


a, 
otherwise known as the Karhunen-Loeve transform (KLT), results in the desired property: 
Equation: 


E {y(m)y'(m)} = E {Vix (m)x' (m)Vz} = ViReV2 = ViVedcViVe = Ac. 
To summarize: 


1. the orthogonal transformation matrix T minimizing reconstruction error variance has rows equal to 
the eigenvectors of the input's NV x NV autocorrelation matrix, 

2. the variances of the optimal-transform outputs {e.} are equal to the eigenvalues of the input 
autocorrelation matrix, and 

3. the optimal-transform outputs {yo (m), ---, yw—1 (m)} are uncorrelated. (Why? Note the zero- 
valued off-diagonal elements of Ry =E {y (m)y* (m)}.) 


e Note that the presence of mutually uncorrelated transform coefficients supports our approach of 
quantizing each transform output independently of the others. 


Example: 
2x2 KLT Coder 


1 
Recall Example 1 from "Background and Motivation" with Gaussian input having R, = ( ; » Une 
p 


eigenvalues of R, can be determined from the characteristic equation |R — AI| = 0: 
Equation: 


1—2X op 
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The eigenvector Vp corresponding to eigenvalue Ag = 1+ psolves Rzvo = Aovo. Using the notation 


Vo00 V10 
Vo = and Vi= 5 
Vo01 Vi11 


Equation: 
voo + pvo1 = (1+ p)v00 
( ) — Vor = Voo- 
pro +o = (1+ p)v01 
Similarly, Rv, = A,vj yields 
Equation: 
vio + pv11 = (1—p)v10 
j rae a OMe ae 10: 
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For orthonormality, 
Equation: 


Equation: 


Vig + Vy 1 > AY : ( , ) 
tot ¥i1 i : 
V2\-1 
ee t t ofl tL 
Thus the KLT is given by T = V‘, = (vo v1) = Srp se 
Using the KLT and optimal bit allocation, the error reduction relative to PCM is 
Equation: 


Gp RO ah V(i+p)-p) | _ 2 
olpom ‘Ye 7((1+ e) + (1—p)) 7 vi : 


since 7, = Yz for Gaussian x(n). This value equals 0.6 when p = 0. 8, and 0.98 when p = 0. 2 (compare 
to Figure 2 from "Background and Motivation"). 


Performance 

Here we analyze the optimal reconstruction error for transform coding. As the number of channels 
grows to infinity, the performance gain over PCM is shown to depend on the spectral flatness measure. 
Meanwhile, the performance of transform coding with an infinite number of channels is shown to 
equal that of DPCM with an infinite-length predictor. However, when the DPCM predictor length is 
equal to the number of transform coding channels, we show that DPCM always yields better 
performance. 


Asymptotic Performance Analysis 


e Foran N x N transform coder, Equation 1 from "Gain over PCM" presented an expression for 
the reconstruction error variance a rc written in terms of the quantizer input variances {or }. 
Noting the N-dependence on 2 rc in Equation 1 from "Gain over PCM" and rewriting it as 
o?|ro,v, a reasonable question might be: What is o2|7¢,n as N 00? 

e When using the KLT, we know that o, = Xx where A, denotes the k*” eigenvalue of Rz. If we 
plug these c. into Equation 1 from "Gain over PCM", we get 


Equation: 


N-1 1/N 
o*|ro.N - eT] | ; 


k=0 


Writing ([[;, Ax) UN _exp (+> 5; In Ax) and using the Toeplitz Distribution Theorem (see 
Grenander & Szego) 
Equation: 


3. a 1/7 7 
For any f(), fim 5p DFW) = a | f(S: (&*)) de 
with f(-) =In (-), we find that 
Equation: 
1 . ; 
2 TC.N = a ae exp (=. / In S, (caw) 


lim o; 
N->0o 
= yo. 2 SEM, 


where SFM, denotes the spectral flatness measure of x(m), redefined below for convenience: 
Equation: 


exp (+ fc, In S, (e%”) dw) 
iy J, Se (e®)dw 


SFM, = 


Thus, with optimal transform and optimal bit allocation, asymptotic gain over uniformly 
quantized PCM is 
Equation: 


o?|pcm _ YaF 22 2R = 42 oRY—! 
Vy x : 


GTC,N-r00 ~ o?|Tc,v00 Yyo22-22SFM,, 


¢ Recall that, for the optimal DPCM system, 
Equation: 


2 

o?|pom o 

GpPpcoM,N30 = SQ =o? 
O; |DPCM,N—+00 Oe | aii 


where we assumed that the signal applied to DPCM quantizer is distributed similarly to the signal 
applied to PCM quantizer and where o? | min C€notes the prediction error variance resulting from 


use of the optimal infinite-length linear predictor: 


Equation: 
o? | min =X ee [ In S, (e”) dw . 
‘ 2m J_y 


Making this latter assumption for the transform coder (implying 7, = yz) and plugging in o? | 


yields the following asymptotic result: 
Equation: 


min 


=| 
Gtc,N-co = GpPCM,N-00 = SFM, |. 


In other words, transform coding with infinite-dimensional optimal transformation and optimal 
bit allocation performs equivalently to DPCM with infinite-length optimal linear prediction. 


Finite-Dimensional Analysis: Comparison to DPCM 


e The fact that optimal transform coding performs as well as DPCM in the limiting case does not 
tell us the relative performance of these methods at practical levels of implementation, e.g., when 
transform dimension and predictor length are equal and < oo. Below we compare the 
reconstruction error variances of TC and DPCM when the transform dimension equals the 
predictor length. Recalling that 


Equation: 
or 
Gppcm,y-1 = ———— 
Oe|min,N—1 
and 
Equation: 
2 _ [Ry 
ool ni N= 
|IRv-1| 


where Ry denotes the N x N autocorrelation matrix of x(n), we find 
Equation: 


2 [Rv-il 


2 [Ry-2| 
x IRv_-1| ’ 


2 [Rv-3| 


GpPcM,N-3 = 0% ea 
N-2 


GppcM,N-1 = 7 GppcM,N-2 = 7 


Recursively applying the equations above, we find 


Equation: 


N-1 ; 
G — == 
II DPCM, (o7) Rw Ral 


which means that we can write 
Equation: 


g(ia a 
Ry| = (03) (TI Goren] : 
pea 


If in the previously derived TC reconstruction error variance expression 
Equation: 


N-1 1/N 
olron = e(T] | 
70 


we assume that 7 = Yz and apply the eigenvalue property [[, Ag = |R]|, the TC gain over 
PCM becomes 
Equation: 


2 os 
o;|pcm gre ee 
Gion = = 


2) —1/N 
o2\To.N = N-1 
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N-1 1/N 
(I Crews] 
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< Gppcm,: 


The strict inequality follows from the fact that Gppcm,z is monotonically increasing with k. To 
summarize, DPCM with optimal length-N prediction performs better than TC with optimal 

N x N transformation and optimal bit allocation for any finite value of N. There is an intuitive 
explanation for this: the propagation of memory in the DPCM prediction loop makes the effective 
memory of DPCM greater than N, while in TC the effective memory is exactly N. 


Sub-Optimum Orthogonal Transforms 

The KLT, which is the optimal orthogonal transform for transform coding, requires knowledge of the source 
statistics that may be unavailable in practice, and eigendecomposition that may be too expensive in practice. This 
motivates the consideration of other orthogonal transforms, like the DFT, DCT, and DHT. Here we discuss these 
and analyze the resulting performance when possible. In addition, we discuss practical matters like real-valued 
DFTs and fast DCT implementations. 


¢ Goal: Recall that the goal of the optimal orthogonal transform was to minimize the ratio of geometric to 
arithmetic output variances: 
Equation: 


N-1 2 1/N 
( k=0 7 .) 
Wy ya ,, 

2 

y 
values when the difference between the a, (sometimes called the dynamic range of {o?. }) is large. 

e Problem with KLT: The KLT, i.e., the orthogonal transform minimizing [link], is a function of the input signal 
statistics. Specifically, the KLT equals the eigenvector matrix of the input autocorrelation matrix. 
Unfortunately, realistic signals are non-stationary, requiring continual KLT redesign if optimality is to be 
preserved, and eigenvector computation is computationally intensive, especially for large N. Thus, the 
question becomes: Are there fixed orthogonal transforms that do a good job of minimizing the ratio [link] for 
“typical” input signals? As we will see, the answer is yes... 

¢ DET Intuitions: For the sake of intuition, lets first consider choosing T as an orthogonal DFT matrix. In this 
case, the coefficient variances {o;, } would be samples of the power spectrum and the dynamic range of 


The ratio [link] attains its maximum value (= 1) when oy, are equal for all k and takes on much smaller 


{oz} would be determined by the relative input power in different frequency bands. Recalling that 


asymptotic TC (NV — oo) performance is determined by SFM,, which has the same geometric-to-arithmetic- 
average structure as [link]: 
Equation: 


1/N 
exp (Jin. (c)du) (IIe! S (e*)) 
Be 1 ff" § (eiv\d = ee jon : 
lw 2 (e )dws W Duke0 S(e ) 


wp=2rk/N 


we might intuit that the DFT is optimal as N — oo. The asymptotic optimality of the DFT can, in fact, be 
proven (see Jayant & Noll). Of course, we don't have much reason to expect that the DFT would be optimal 
for finite transform dimension N. Still, for many signals, it performs well. (See [link].) 

¢ Other Transforms: The most commonly used orthogonal transform in speech, image, audio, and video coding 
is the discrete cosine transform (DCT). The excellent performance of the DCT follows from the fact that it is 
especially suited to “lowpass” signals, a feature shared by most signals in the previously mentioned 
applications. Note that there are plenty of signals for which the DCT performs poorly—it just so happens that 
such signals are not frequently encountered in speech, image, audio, or video. We will describe the DCT and 
provide intuition regarding it's good “lowpass” performance shortly. Like the DFT, the DCT has fast 
algorithms which make it extremely practical from an implementation standpoint. Another reasonably popular 
orthogonal transform, requiring even less in the way of computation, is the discrete Hadamard transform 
(DHT), also described below. [link] compares DFT, DCT, DHT, and KLT for various transform lengths N 
along with asymptotic TC performance. [link](a) shows gain over PCM when using a lowpass autoregressive 
(AR) source {a(n)} generated from white Gaussian noise {v(n)} via: 
Equation: 


while [link](b) shows the gain for highpass {x(n)}: 


Equation: 


1 


1+ 0.827! 


See [link] for the power spectra of these two processes. 


V (2). 


Gto,n for various transforms and various N on an AR(1) lowpass (left) and 
highpass (right) sources. 


Power spectra of AR(1) sources used in the 
transform matrix comparisons of [link]. 


e The DHT: The N x N DHT is defined below for power-of-two N: 


Equation: 
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H, = —— 
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Note that Hy is orthogonal| footnote], i.e., H nH = I. As an example 


Equation: 


Hy 
Hy 


Hy 
—Hy 
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[link] illustrates DHT basis vectors for the case VV = 8. The primary advantage of the DHT is that its 
implementation can be accomplished very efficiently. [link] suggests that the DHT performs nearly as well as 
the KLT for N = 2 and 4, but its performance falls well short of optimal for larger N. 


DHT 


8 x 8 DHT basis vectors. 


Caution: outputs of the Matlab command hadamard must be scaled by 1/ WN to produce orthogonal DHT 
matrices! 

The DFT; The normalized[footnote] DFT from {z,,} to {yx} is defined below along with its inverse. 
Equation: 


—— sie 

Ye = —= tne on™. k= Q---N-1, 
N n=0 
1. eae 

Lp, = — yer. n=0---N-1. 
N 420 


The normalized DFT can be represented by a symmetric unitary[footnote] matrix Wy: 
Equation: 


[Wallin = einm™. kn=0---N-1. 


By unitary, we mean that W vw =I, where (.)" denotes complex conjugation. Note that a unitary matrix 
is the complex-valued equivalent of an orthogonal matrix. In practice, the N x N DFT is implemented using 
the fast Fourier transform (FFT), which requires = a log, N complex multiply/adds when N is a power of 
two. 

Due to the norm-preserving scale factor 1/ VN, the DFT definitions above differ from those given in most 
digital signal processing textbooks. 

Outputs of the Matlab command dftmtx must be scaled by 1/ VN to produce unitary DFT matrices. 

The Real-Valued DFT: Since we assume real-valued x,,, complex-valued DFT outputs y, might seem 
problematic since transmitting both real and imaginary components would decrease our transmission 
efficiency. For a real-valued DFT input, however, the DFT outputs exhibit conjugate symmetry which allows 
us to represent the N complex valued outputs with only N real-valued numbers. More precisely, real-valued 
DFT input {z,,} implies that DFT output {y;,} has the property 

Equation: 


which implies 

Equation: 
Re{yz} = Re{yn_x} k= 1,2,---,.N/2, 
Im{yz} = —Im{yy-e} k=1,2,---,N/2, 
Im{yo} = Im{ywj2} = 0. 


A good method by which to select non-redundant components of the DFT output is: 


1. Compute complex-valued {y;,} using the standard DFT. 
2. Construct real-valued {y;,} from {y;} as follows: 
Equation: 
Yo = yo (ER) 
Re pos V/2 
y% = v2Im{y} 
yy = v2 Re{y} 


yy-3 = V2Im {yn} 
yy-2 = V2Re{ynj21} 
Yv-1 = Yn2 (ER). 


The method above is convenient because (i) it preserves the frequency ordering of the DFT and (ii) it 
preserves the norm of the DFT output vector, i.e., || y’ ||=|| y ||. Using the conjugate symmetry property of 
DFT outputs, we can write the transformation from {y;,} from {y;,} as a matrix operation U y: 

Equation: 


1 0 0 0 0 0 0 ah 
0 -j/V2 0 0 0 0 3/Vv2 Re el 
0 1/V2 0 0 0 Of  tf/2 : 
0 oO -j/v2 0 0 j/v2 0 Re { We 
y=]0 0 1/Vv2 0 0 1/v2 0 a 
: Re 2-17 — 
0 + 0 -j/V¥2 0 j/V2 0 vr 
0 wee OD DiAl2- OV TQ Or gz 0 
e {yo} — 
0 omar iG O° -I- &@ 0 
Re{y, } — 
{yi} 


The normalization feature guarantees that U y is unitary (which is easily checked by inspection). Then 

Uy Wy, the product of two unitary matrices, is also unitary. Since Uj Wy is actually real-valued (since it 
takes any real-valued x to a real-valued y) it should be referred to as orthogonal rather than unitary. 
Henceforth we rename U y W w the real-valued DFT matrixWy, 

Equation: 


Wy = Unwy. 


It is easily checked that the basis vectors of W%, are sampled sines and cosines at the uniformly spaced 
frequencies {27k/N; k =0,---,.N/2}. [link] gives an illustration of the real-valued DFT basis vectors for 
the case N = 8. 


8 x 8 Real-valued DFT basis vectors. 


Example: 
[DFT and Real-valued DFT for NV = 4] 
Equation: 
uo oat atl 3 4 
jj —=y = 4 =1 V2 4-93) 2 
Wa = d : F x 5 Wx = / a3 J / 
21 =1 1 =! 3 
19 -l - 2 —1/2 — 73/2 
Equation: 
il 1 1 1 @ 4 
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Recall that when N is a power of 2, an N-dimensional complex-valued FFT requires ~ x log, N complex 
multiply/adds. When the input is real, however, an N-dimensional FFT may be computed using ~ N log, N 
real multiply/adds (see Sorensen & Jones & Heideman & Burrus TASSP 87). 

The DCT: The DCT is defined below along with its inverse 

Equation: 


(2n + 1)ka 
Yk = [Fan Yn cos! Z Cn er k=0---N-1, 


for ag = 1/V2, anz9 = 1, 


(2n + 1)kx 
t = eS n=0---N-1, 


The DCT can be represented by an orthogonal matrix Cy: 
Equation: 


a 2n+1)k 
= [2 a4 cos eee kn =0---N-1. 
N 2N 


See [link] for an illustration of DCT basis vectors when N = 8. 


ocT 


8 x 8 DCT basis vectors. 


e A Fast DCT: There are a number of fast algorithms to compute the DCT. The method presented below is 
based on the FFT and leads to intuition concerning the good “lowpass” performance of the DCT. 


1. Create a 2N-length mirrored version of N-length {z,,} (see [link](c)): 
Equation: 


fie In n=0,1,---,N—1, 
Be | eine ane, WEN NEA 2, ON, 


2. Compute {y;,}, the 2V-point DFT of {z, }: 


Equation: 
2N-1 [> 
YU = —— ven 2 Dy, e Jaw kn — ow cos —_= 


3. Compute {y;,}, the N-point DCT outputs, from {yz}: 
Equation: 


Yk = e J an Gn; k=0,1,---,N-1. 


Assuming a real-valued input, the scheme outlined above can be implemented using 
Equation: 


2N 
~2N + 2N log, = = 2N (1+ log, N) real-valued multiply /adds 


where we are assuming use of the real-FFT described previously. 

DCT vs. DFT Performance for Lowpass Signals: [link] suggests that DCT and DFT performance both equal 
KLT performance asymptotically, i.e., as transform dimension NV — oo. Indeed, this can be proven (see 
Jayant & Noll). A more practical question is: How do DCT and DFT performances compare for finite N? To 
answer this question, we will investigate the effects of input data block length on the DCT and DFT. To start, 
consider the DFT of an N-length input block {a,---,2y_1}: 


Equation: 
1 3 if 
= —jqekn, = 
Yk = —— tne N's k=O0---N-1. 
VN n=0 
It can be seen that the DFT outputs {Xo,---,X_i} are samples of the discrete-time Fourier transform 
(DTFT) at frequencies {wr — = k; k=0--- N-1}: 
Equation: 
a Sane (0 
X(w) = — xr,e I", Yn = X atk ; 
N n=0 
Now lets consider a periodic extension of {,---, £—1} which repeats this N-length sequence a total of L 
times: 
Equation: 
1 NL NL 
f= | yr SPs 
0 else 


Above, (7) ,, denotes “n modulo N” and L is assumed even (see [link](a)-(b)). Here is the interesting point: 
the DTFT of the NL-length periodic extension equals the DTFT of the original N-length data block when 
sampled at the frequencies {w;}! 


Equation: 
1 NERA : 
X'(w) = — Yo ate dtm 
VN n=“Nb/2 


1 L/2-1 N-1 
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This implies that the overall spectral shape of X’ (w) will be inherited by the DFT outputs {Xo,---, Xy_i}. 
So what is the overall shape of X’ (w)? Say that {x,,} is a lowpass process. Being lowpass, we expect the 
time-domain sequence {z,,} to look relatively “smooth.” If the starting and ending points of the N-block, are 


different, however, the periodic extension {z/, } will exhibit time-domain discontinuities (see [link](b)) that 
are uncharacteristic of the process {x,,}. These discontinuities imply that X’ (w) will contain high-frequency 
content not present in the power spectrum of the lowpass input process. Based on our previous findings, if 
artificial high-frequency content exists at X’ (w;,), it must also exist at _X (w;,) = Xj}. In conclusion, the 
periodic extension {z/, } provides an intuitive explanation of why short-block DFT analysis of lowpass 
signals often seems corrupted by “artificial” high-frequency content. So why is this important? Recall that 
transform performance is proportional to the dynamic range of transform output variances. If the DFT outputs 
corresponding to otherwise low spectral energy are artificially increased due to short-window effects, the 
dynamic range of {a2} will decrease, and DFT performance will fall short of optimal. 


(a) 


Illustration of periodic extensions inherent to DFT and 
DCT: (a) N-length DFT input block, (b) periodic 
extension inherent to DFT, (c) equivalent 2.V-length 
DFT-input-block for DCT, (d) periodic extension 
inherent to DCT. 


Now lets consider the DCT. From derivation of the fast algorithm, we know that the N DCT output 
magnitudes from length-N input {2o, ---, £—1} are equal to the first N DFT output magnitudes from length- 
2N input {Z,,}—a mirrored version of {xo, ---, £—1}. (See [link](c).) Due to the mirroring effect, the 
periodic extension of {%,,} will not have the discontinuities present in the periodic extension of {z,}, and so 
a 2N-point DFT analysis of {%,, } will not have “artificial” high frequency enhancement. Assuming that the 
process from which {z,,} was extracted is lowpass, the DCT outputs will exhibit large dynamic range, and an 
improvement over DFT coding performance is expected. This is confirmed by [link](a). 


Introduction and Motivation 
In this module, we give a brief introduction to sub-band coding, its relation to transform 
coding, and its use in MPEG-style audio coding. 


¢ Sub-band coding is a popular compression tool used in, for example, MPEG-style 
audio coding schemes (see [link]). 
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Simplified MPEG-style audio coding system. 


e [link] illustrates a generic subband coder. In short, the input signal is passed through 
a parallel bank of analysis filters { H; (z)} and the outputs are “downsampled” by a 
factor of N. Downsampling-by-N is a process which passes every N“” sample and 
ignores the rest, effectively decreasing the data rate by factor N. The downsampled 
outputs are quantized (using a potentially different number of bits per branch—as in 
transform coding) for storage or transmission. Downsampling ensures that the 
number of data samples to store is not any larger than the number of data samples 
entering the coder; in [link], N sub-band outputs are generated for every N system 
inputs. 
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Sub-band coder/decoder with scalar quantization. 


¢ Relationship to Transform Coding: Conceptually, sub-band coding (SC) is very 
similar to transform coding (TC). Like TC, SC analyzes a block of input data and 


produces a set of linearly transformed outputs, now called “subband outputs.” Like 
TC, these transformed outputs are independently quantized in a way that yields 
coding gain over straightforward PCM. And like TC, it is possible to derive an 
optimal bit allocation which minimizes reconstruction error variance for a specified 
average bit rate. In fact, an N-band SC system with length-N filters is equivalent to 
a TC system with N x N transformation matrix T: the decimated convolution 
operation which defines the 7” analysis branch of [link] is identical to an inner 
product between an N-length input block and t, the i” row of T. (See [link].) 


(a) 
TS 


Equivalence between (a) N-band sub-band coding with length-N filters and (b) 
N x N transform coding (shown for NV = 4). Note: impulse response 
coefficients {h,,} correspond to filter H; (z). 


So what kind of frequency responses characterize the most-commonly used 
transformation matrices? Lets look at the DFT first. For the i” row, we have 
Equation: 
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[link] plots these magnitude responses. Note that the i*” DFT row acts as a 
bandpass filter with center frequency 277/N and stopband attenuation of ~ 6 dB. 
[link] plots the magnitude responses of DCT filters, where we see that they have 
even less stopband attenuation. 
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Magnitude responses of DFT basis vectors for 
N= Ss. 


Magnitude responses of DCT basis vectors for 
N =. 


¢ Psycho-acoustic Motivations: We have seen that N-band SC with length-N filters is 
equivalent to N x N transform coding. But is transform coding the best technique 
to use in high quality audio coders? It turns out that the key to preserving sonic 
quality under high levels of compression is to shape the reconstruction error so that 
the ear will not hear it. When we talk about psychoacoustics later in the course, 
we'll see that the properties of noise tolerated by the ear/brain are most easily 
described in the frequency domain. Hence, bitrate allocation based on 
psychoacoustic models is most conveniently performed when SC outputs represent 
signal components in isolated frequency bands. In other words, instead of allocating 
fewer bits to sub-band outputs having a smaller effect on reconstruction error 
variance, we will allocate fewer bits to sub-band outputs having a smaller 
contribution to perceived reconstruction error. We have seen that length-N DFT and 
DCT filters give a 27/N bandwidth with no better than 6 dB of stopband 
attenuation. The SC filters required for high-quality audio coding require much 
better stopband performance, say > 90 dB. It turns out that filters with passband 
width 27/N, narrow transition bands, and descent stopband attenuation require 
impulse response lengths >> NV. In N-band SC there is no constraint on filter length, 
unlike N-band TC. This is the advantage of SC over TC when it comes to audio 
coding|[footnote]. 
A similar conclusion resulted from our comparison of DPCM and TC of equal 
dimension N; it was reasoned that the longer “effective” input length of DPCM with 
N-length prediction filtering gave performance improvement relative to TC. 

¢ To summarize, the key differences between transform and sub-band coding are the 
following. 


1. SC outputs measure relative signal strength in different frequency bands, while 
TC outputs might not have a strict bandpass correspondence. 

2. The TC input window length is equal to the number of TC outputs, while the 
SC input window length is usually much greater than number of SC outputs 
(16x greater in MPEG). 


e At first glance SC implementation complexity is a valid concern. Recall that in TC, 
fast NV’ x N transforms such as the DCT and DFT could be performed using 
~ N log, N multiply/adds! Must we give up this computational efficiency for 
better frequency resolution? Fortunately the answer is no; clever SC 
implementations are built around fast DFT or DCT transforms and are very efficient 
as a result. Fast sub-band coding, in fact, lies at the heart of MPEG audio 
compression (see ISO/IEC 13818-3). 


Fundamentals of Multirate Signal Processing 

Here we present some background material on multirate signal processing that is necessary to 
understand the filterbank processing used in sub-band coding. In particular, we describe modulation, 
upsampling, and downsampling in several domains: the time-domain, z-domain, and DTFT domain. 
In addition, we describe the aliasing phenomenon. 


The presence of upsamplers and downsamplers in the diagram of Figure 2 from "Introduction and 
Motivation" implies that a basic knowledge of multirate signal processing is indispensible to an 


understanding of sub-band analysis/synthesis. This section provides the required background. 


e Modulation: 
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Modulation using e4”°” 


[link] illustrates modulation using a complex exponential of frequency @,. In the time domain, 


Equation: 
y(n) = a(njeMnr |. 
In the z-domain, 
Equation: 
¥i(2)..S S- y(n)z * = >. (a (nje™"\z" = is x(n) (e#>z) — Ale Pz) |, 
We can evaluate the result of modulation in the frequency domain by substituting z = e””. This 
yields 
Equation: 


Y(w) = So y(nje * = | X(w — wo) |. 


Note that X(w — w,) represents a shift of X(w)up by @, radians, as in [link]. 
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Frequency-domain effect of modulation by 
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Upsampling: 
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Upsampling by N. 


[link] illustrates upsampling by factor NV. In words, upsampling means the insertion of N—1 
zeros between every sample of the input process. Formally, upsampling can be expressed in the 
time domain as 

Equation: 


z(n/N) whenn=MmMN forme Z 
y(n) = 


0 else. 


In the z-domain, upsampling causes 
Equation: 


¥i(z) = aime" a Soa (m)z =| XZ"), 
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and in the frequency domain, 
Equation: 


As shown in [link], upsampling shrinks X(w) by a factor of N along the axis. 


Y(w) = X(Nw) 


Frequency-domain effects of upsampling by N =2. 


¢ Downsampling: 


x(n) ——>(|N}—> y(m) 


Downsampling by N. 


[link] illustrates downsampling by factor N. In words, the process of downsampling keeps 
every NV" sample and discards the rest. Formally, downsampling can be written as 
Equation: 


[ ym) = #(miN)- 


In the z domain, 
Equation: 


where 
Equation: 


a= z(n) whenn=mN forme Z 
~ lO else. 


The neat trick 
Equation: 


en jmp 1 when n=mN formeZ 
——r e — 
N 0 else 


(which is not difficult to prove) allows us to rewrite % (7) in terms of x(n): 
Equation: 


i ee Sait 
YQ; = S > a(n) goseee Zz 
1 
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Translating to the frequency domain, 
Equation: 


As shown in [link], downsampling expands each 27-periodic repetition of X (w) by a factor of 
N along the w axis. Note the spectral overlap due to downsampling, called “aliasing.” 
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Y(w) 
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Frequency-domain effects of downsampling by N =2. 


e Downsample-Upsample Cascade: 


2(n) y(n) 


N-Downsampler followed by N-upsampler. 


Downsampling followed by upsampling (of equal factor N) is illustrated by [link]. This 
structure is useful in understanding analysis/synthesis filterbanks that lie at the heart of sub- 
band coding schemes. This operation is equivalent to zeroing all but the mN* samples in the 
input sequence, i.e., 

Equation: 


z(n) whenn = mN forme Z 
y(n) = 
0 else. 


Using trick [link], 
Equation: 
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which implies 
Equation: 


The downsampler-upsampler cascade causes the appearance of 27 /N-periodic copies of the 
baseband spectrum of X(w). As illustrated in [link], aliasing may result. 


X (w) 


Frequency-domain effects of downsampler-upsampler cascade for 
N= 2, 


Uniformly-Modulated Filterbanks 


Filterbanks with uniform sub-band bandwidth are described, and the tradeoff between reconstruction error 
and frequency-selectivity is discussed. The polyphase/DFT filterbank implementaiton is also discussed, 


along with its computational cost. 


e Perfect Reconstruction Filterbanks: Recall that in our study of transform coding, we restricted our 


attention to orthogonal transformation matrices. Orthogonal matrices had the property that, in the 


absence of quantization error, the reconstruction error was zero. For sub-band coding, “perfect 


reconstruction” (PR) filterbanks (FBs) are analogous to orthogonal matrices. Specifically, a PR-FB is 
defined as an analysis/synthesis structure which gives zero reconstruction error when synthesis stage is 
fed exact (unquantized) copies of analysis outputs. Initially we consider the design of ideal sub-band 


analysis and synthesis FBs and later the design of practical FBs. For the purpose of FB design we 


ignore the effects of quantization error. Our rational is as follows: the absence of quantization error 
corresponds to the high bit-rate scenario, in which case we desire that the filtering operations inherent 
to sub-band coding introduce little or no error of their own. Removing the quantizers from Figure 2 


from "Introduction and Motivation", we obtain the analysis/synthesis FBs in [link]. 
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N-band analysis and synthesis filterbanks. 


Uniform Modulation: The most conceptually straightforward FB is known as the “uniformly 
modulated” FB. Uniform modulation means that all branches isolate signal components in non- 
overlapping frequency bands of equal width 27/N. We will assume that the i‘ branch has its 
frequency band centered at w; = 2im/N. (See [link].) 


2n/N w; = 2ni/N 


Frequency bands for the uniformly-modulated filterbank ( 
N=4). 


u(n) 


¢ Analysis FB: The i” frequency range may be isolated by modulating the input spectrum down by @; 
and lowpass filtering the result. (See the first two stages of the analysis bank in [link].) The ideal 
lowpass filter has linear phase and magnitude response that is unity for w € (—a/N,7/N) and zero 
elsewhere. (See [link].) 


|H(w)| 
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Ideal (dashed) and typical (solid) prototype-filter magnitude 
responses for the uniformly modulated filterbank. 


With ideal lowpass filtering, the resulting signals have (double-sided) bandwidths that are N times 
smaller than the sampling rate. Nyquist's sampling theorem (see Oppenheim & Schafer) says that it is 
possible to sample signals with this bandwidth at 1/N times the filter output rate without loss of 
information. This sample rate change can implemented via downsampling-by-N, resulting in the 
analysis FB of [link]. Note that the downsampling operation does not induce aliasing when the 
analysis filter is the ideal lowpass filter described above. 

Synthesis FB: To reconstruct the input signal x(n), the synthesis FB must restore the downsampled 
signals to their original sampling rate, re-modulate them to their original spectral locations, and 
combine them. Upsampling is the first stage of sampling-rate restoration. Recall from Equation 18 
from "Fundamentals of Multirate Signal Processing" (and Figure 8 from "Fundamentals of Multirate 
Signal Processing") that a downsampler-upsampler cascade creates N —1 additional uniformly-spaced 
spectral copies of the original baseband spectrum. Thus, to remove the unwanted spectral images, an 
“anti-imaging” lowpass filter is applied to each upsampler's output. Ideally, this lowpass filter is linear 
phase with magnitude response that is unity for w € (—7/N,2/N) and zero elsewhere; the same 
specifications given for the ideal analysis filter. (See [link].) As shown in [link], re-modulation is 
accomplished by shifting the i*” branch up by «;. When the analysis and synthesis filters share a 
common phase response, the re-modulator outputs can be combined coherently by a simple 
summation. Under all of these ideal conditions, the output signal u(7) is a potentially delayed (but 
otherwise perfect) copy of the input signal x(n): 

Equation: 


u(n) = a(n—6) for some delay 6. 
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N-band modulated analysis/synthesis filterbanks. 


e Effect of Non-Ideal Filtering: In practice, the analysis and synthesis filters will not have ideal lowpass 


responses, and thus the reconstructed output u(7) will not necessarily equal a delayed version of the 
input z(n). These shortcomings typically result from filter implementations based on a finite number 
of design parameters. (See [link] for a typical lowpass filter magnitude response.) It should be noted 
that, under certain conditions, it is possible to design sets of analysis filters {H; (z)} and synthesis 
filters {K; (z)} with finite parameterizations which give the “perfect reconstruction” (PR) property 
(see Vaidyanathan). Though such filters guarantee PR, they do not act as ideal bandpass filters and 
thus do not accomplish perfect frequency analysis. (Consider the length-N DFT and DCT filter 
responses: by the orthogonal matrix argument, these are perfectly reconstructing, but from Figure 4 
from "Introduction and Motivation" and Figure 5 from "Introduction and Motivation", they are far 
from perfect bandpass filters!) Due to their limited frequency-selectivity, none of the currently-known 
PR filterbanks are appropriate for high-quality audio applications. As a result, we focus on the design 
of filterbanks with 


1. near-perfect reconstruction and 
2. good frequency selectivity. 


As we will see, it is possible to design practical filters with excellent frequency selectivity and 
responses so close to PR that the smallest quantization errors swamp out errors caused by non-PR 
filtering. 

Polyphase/DFT Implementation: When H(z) and K(z) are length-M FIR filters, the unique elements 
in [link] are the N uniform-modulation coefficients { ef2nn/ N:n=0,---,N— 1} and the 2M the 
lowpass filter coefficients {h, } and {k,,}. It might not be surprising that each half of the uniformly- 
modulated FB has an implementation that requires only one N-dimensional DFT and M multiplies to 
process an N-block of input samples. [link] illustrates one such implementation, where the 
“polyphase” filters {H (z)} and {AK (z)} are related to the “prototype” filters H(z) and K(z) 
through the impulse response relations: 

Equation: 


£ 
ho := hmne 


i for £=0,---,N—1. 
km := Rmn+e 


The term “polyphase” comes about because the magnitude responses of well-designed {H (z)} and 
{K (z)} are nearly flat, while the slopes of the phase response of these filters differ by small 
amounts. The equivalence of [link] and [link] will be established in the homework. 
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Polyphase/DFT implementation of N-band uniformly modulated 
analysis/synthesis filterbanks. 


Recognize that the filter computations in [link] occur on downsampled (i.e., low-rate) data, in contrast 
to those in [link]. To put it another way, all but one of every N filter outputs in [link] are thrown away 
by the downsampler, whereas none of the filter outputs in [link] are thrown away. This reduces the 
number of required filter computations by a factor of N. Additional computational reduction occurs 
when the DFT is implemented by a fast transform. Below we give a concrete example. 


Example: 

Computational Savings of Polyphase/FFT Implementation) 

Lets take a look at how many multiplications we save by using the polyphase/DFT analysis filterbank 
in [link] instead of the standard modulated filterbank in [link]. Here we assume that N is a power of 2 
(see Sorensen, Jones, Heideman & Burrus TASSP 87), so that the DFT can be implemented with a 
radix-2 FFT. With the standard structure in [link], modulation requires 2N real multiplies, and 
filtering of the complex-valued modulator outputs requires 2 x N x M additional real multiplies, for 
each input point x(n). This gives a total of 

Equation: 


2N(M +1) | real multiplies per input. 


In the polyphase/FFT structure of [link], it is more convenient to count the number of multiplies 
required for each block of N inputs since each new N-block produces one new sample at every filter 
input and one new N-vector at the DFT input. Since the polyphase filters are each length-M/N, 
filtering the block requires N x M/N = M real multiplies. Though the standard radix-2 N- 
dimensional complex-valued FFT uses = log, N complex multiplies, a real-valued N-dimensional 
FFT can be accomplished in N log, N real multiplies when N is a power of 2. This gives a total of 
Equation: 


M +N log, N real multiplies per N inputs, or | M/N-+ log, N | real multiplies per input! 


Say we have N = 32 frequency bands and the prototype filter is length MM = 512 (which turn out to 
be the values used in the MPEG sub-band filter). Then using the formulas above, the standard 


implementation requires | 32832 | multiplies per input, while the polyphase/DFT implementation 


requires only | 21 |! 


MPEG Layers 1-3: Cosine-Modulated Filterbanks 

Here the "polyphase quadrature" filterbank used in the MPEG audio standards is described in great detail. It has 
the following practical features: real-valued sub-band outputs, near-perfect reconstruction, and polyphase 
implementation; and is based on cancellation of adjacent sub-band interference. 


e Though the uniformly modulated filterbank in Figure 4 from "Uniformly-Modulated Filterbanks" was 
shown to have the fast implementation in Figure 5 from "Uniformly-Modulated Filterbanks", the sub-band 
outputs are complex-valued for real-valued input, hence inconvenient (at first glance[footnote]) for sub- 
band coding of real-valued data. In this section we propose a closely related filterbank with the following 
properties. 


1. Real-valued sub-band outputs (assuming real-valued inputs), 
2. Near-perfect reconstruction, 
3. Polyphase/fast-transform implementation. 


This turns out to be the filterbank specified in the MPEG-1 and 2 (layers 1-3) audio compression standards 
(see ISO/IEC 13818-3). 

In the structure in Figure 4 from "Uniformly-Modulated Filterbanks", it would be reasonable to replace the 
standard DFT with a real-valued DFT (defined in the notes on transform coding), requiring ~ N log, N 
real-multiplies when N is a power of 2. Though it is not clear to the author why such a structure was not 
adopted in the MPEG standards, the cosine modulated filterbank derived in this section has equivalent 
performance and, with its polyphase/DCT implementation, equivalent implementation cost. 


Filter Design 


e Real-valued Sub-band Outputs: Recall the generic filterbank structure of Figure 1 from "“Uniformly- 
Modulated Filterbanks". For the sub-band outputs to be real-valued (for real-valued input), we require that 
the impulse responses of {.H; (z)} and {K; (z)} are real-valued. We can insure this by allocating the N 
(symmetric) frequency band pairs shown in [link]. The positive and negative halves of each band pair are 


2i+1 : 
centered at w; = ( 5 a radians. 


a/N 


= @itlr 
w= "ON 


Frequency band pairs for the polyphase quadrature 
filterbank (NV = 4). 


We can consider each filter H; (z) as some combination of symmetric positive-frequency and negative- 
frequency components 
Equation: 


Hi; (z) = a;F; (z) + 0G; (z) 


as shown in [link]. 


+> w 


(2i+1)x/N 


Positive- and negative-frequency decomposition of 
Hi; (w). Note K; (w) will have a similar, if not 
identical, frequency response. 


When 6; = a, and the pairs {F; (z), G; (z)} are modulated versions of the same prototype filter H(z), we 
can show that H; (z) must be real-valued: 
Equation: 


H,(z) = aH (e352) + a; (H(e* oN 2) 


Fi(2) Gi(z) 
* 
a; S> hye an M2” + a; S- hae aw "2" 
n n 


Re (a;) > Anz "(ein aitn +e inti) + jIm (a;) S- Rnz "(ein titn — e iran t ") 
i 
Re (a;) eB hnz "+ 2-cos (« n) + jIm (a;) S- hnz "+27 sin (= as n) 
2i+1 2i+1 
= 2 = ke (a;) cos (« a n) —Im(a;) sin (« - n)] hye” 


e Aliasing_Cancellation: Recall again the generic filterbank in Figure 1 from “Uniformly-Modulated 
Filterbanks". Here we determine conditions on real-valued {H; (z)} and {.K; (z)} which lead to near- 
perfect reconstruction. It will be insightful to derive an expression for the input to the i” reconstruction 
filter, {y; (rn) }. The downsample-upsample-cascade equation Equation 14 from "Fundamentals of 
Multirate Signal Processing" (fourth equation) implies that 


Equation: 
, Na 
Y¥,(z) = = X;(e*¥"2) 
N ver 
: SH (c-?¥2) x (e172) 
N = 
1 — . ln * . on <n 
= — JaiF (e#¥?2) + a; Gi(e*¥P2) | x(eH¥? 2) 
N = 
1 _ On * - Qn ; On 
= [er (z) +a, te? (z) )x@+ oc F,(e/¥?2) + a; Gi(e#¥P2) | x(eP¥P 2). 


desired 
undesired images 


Thus the input to the i“ reconstruction filter is corrupted by unwanted spectral images, and the 
reconstruction filter's job is the removal of these images. The reconstruction filter A; (z) will have a 
bandpass frequency response similar (or identical) to that of H; (z) illustrated in [link]. Due to the practical 
design considerations, neither A; (z) nor H; (z) will be perfect bandpass filters, but we will assume that 
the only significant out-of-band energy passed by these filters will occur in the frequency range just outside 
of their passbands. (Note the limited “spillover” in [link].) Under these assumptions, the only undesired 
images in Y; (w) that will not be completely attenuated by A; (w) are the images adjacent to F;; (w) and 

G; (w). Which indices p in [link] (third equation) are responsible for these adjacent images? [link] (third 
equation) implies that index p = £ shifts the frequency response up by 27rf/N radians. Since the passband 


centers of F’; (z) and G; (z) are (2i + 1)a/N radians apart, the passband of G; (e WP z) will reside 
directly to the left of the passband of F’; (z) when p = i. Similarly, the passband of G; (3 #2) will 
reside directly to the right of the passband of F; (z) when p = 7 + 1. See [link] for an illustration. Using 
the same reasoning, the passband of F;; (e #P 2) will reside directly to the right of the passband of G; (z) 


when p = —7 and directly to the left when p = —(i + 1). The only exceptions to this rule occur when 
i = 0, in which case the images to the right of G; (z) and to the left of F; (z) are desired, and when 
i = N —1, in which case the images to the left of G; (z) and to the right of F; (z) are desired. 
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Spectral images of Y; (w) not completely attenuated by K; (w). 


Based on the arguments above, we can write {u; (n)}, the output of the 2“ reconstruction filter, as follows: 
Equation: 


Uil2) = Ki(2¥i (2) 
1 * 
= 2K, (2) [aF @)X (2) + 4,6; (2)X(2)| 
desired 
1 (2m 5 On 7 42m; _ gan, 
+5, Ki (z) la.F, (e! Fz) X(e! Fz) +a,;G; (c J Hz) X(e J #2) 
aliasing from inner undesired images when 1<i<N—1 
+OK, (z) la.F, (cf (+0)2) X(ei#(40)2) -- a,G, (e960 2) x (eH) 2) | . 


aliasing from outer undesired images when 0<i<N—2 


The previous equation shows that U; (z) is corrupted by the portions of the undesired images not 
completely removed by the reconstruction filter K; (z). In the filterbank context, this undesired behavior is 
referred to as aliasing. But notice that aliasing contributions to the signal U (z) = }); U; (z) will vanish if 
the inner aliasing components in U; (z) cancel the outer aliasing components in U;_1 (z). This happens 
when 

Equation: 


K; (z) laiF, (ci#*2) x(ei#z) + a, Gi (cz) x(eHFz) | 
= —K;j-1 (z) aia Fia (c/¥42) x(el#iz) + a, ,Gi-1 (c#¥z) x(e9F*z) |. 


which occurs under satisfaction of the two conditions below. 
Equation: 


a,K; (z)F; (c/#2) = —a;_1Kj_1 (z)Fi-1 (ci#z) 
a, K; (z)G; (e#¥*2) = —a,_,Ki-1 (z)Gi-n (e?¥*2), 
We assume from this point on that the real-valued filters {H; (z)} and {K; (z)} are constructed using 
modulated versions of a lowpass prototype filter H(z). (This assumption is required for the existence of a 


polyphase filterbank implementation.) 
Equation: 


Aya) .= ak; (z) +.a,G; (z) { — H (e~d2x @i+1) z) 
* w. ee 
Ki(2) = oF, (z)+¢,G; (2) Gi(z) = H(eiae i+) z) 


Then condition [link] (upper equation) becomes 
Equation: 


aici (oF 2) H (ef -Y 7) + aie; H (ef) 2) H (eh) 

= =a; -10;-1H (eFOYz) H (eH 2) — ay16, 4H (eh) 2) Helse) 2), 

Lets take a closer look at the products H (e~J2w2#+1) z) H(ed2v 2+) z) in the previous equation. As 
illustrated in [link], these products equal zero when 1 < i < N/2 since their passbands do not overlap. 


Setting these products to zero in [link] (bottom equation) yields the condition 
Equation: 


* 


* 
ajc; = —aj_1¢;_,forl <i< N-1} 


which can also be shown to satisfy [link] (bottom equation). 
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Illustration of vanishing terms in [link] (lower 
equation). 


Next we concern ourselves with the requirements on dg and cg. Assuming [link] is satisfied, we know that 
inner aliasing in U; (z) cancels outer aliasing in U;_ (z) for 1 < i < N — 1. Hence, from [link] (fourth 
equation) and [link] (lower equation), 

Equation: 


U(z) = U; (z) 


7 ~ SK; (2) Hi (2)X (2) 
1 N 
N =0 

ak +a; (H(t id ae z)|x (z) 


leu (ear +1) ) + oH (ef) | 


Noting that the passbands of H(e? ay 24+1) z) and H(e?2 ay (2-+1) z) do not overlap for 1 <7 < N — 2, we 
have 


Equation: 
1 * * oie te 
U(z) = WN (aves F aco) H(e JIN z)H(e! IN z) 
Be (anew -- ay-sen-1) H (PON) 2) (elt ON) 2) 
N-1 


1S ost (e+ oe.) LC, 
i=0 


The first two terms in [link] (third equation) represent aliasing components that prevent flat overall 
response at w = 0 and w = 77, respectively. These aliasing terms vanish when 


Equation: 
* * 
* * 
Q@N-1Cn-1 = —@y_iCN-1 
What remains is 
Equation: 
— jE (241 * OF 2 (jot (2i41 
U(z) = 2S (at (etal : 2) +a,c;H (ci . ’z)) X (2). 
i=0 


e Phase Distortion: Perfect reconstruction requires that the analysis/synthesis system has no phase distortion. 
To guarantee the absence of phase distortion, we require that the composite system 
Equation: 


N-1 _ 
Q() = Uz) _ ar) aiciH? (eS +) 2) +0,¢, H? (efte@0)z) 
41=0 


X(z) 


has a linear phase response. (Recall that a linear phase response is equivalent to a pure delay in the time 
domain.) This linear-phase constraint will provide the final condition used to specify the constants {a; } 
and {c;}. We start by examining the impulse response of Q(z). Using a technique analogous to [link] (fifth 
equation), we can write 

Equation: 


Q == si (Sire aces (ao n) hae (saa ae *)) (Sot ie n 


n=0 


Above, we have used the property that multiplication in the z-domain implies convolution in the time 
domain. For Q(z) to be linear phase, it's impulse response must be symmetric. Let us assume that the 
prototype filter H(z) is linear phase, so that {h,,} is symmetric. Thus >, A,,,_, is symmetric about 
n = M —1, and thus for linear phase Q(z), we require that the quantity 

Equation: 


N-1 : , 

2i+1 2i+1 
Re (a;c;) cos (= sk n) — Im (a;c;) sin (« uu n) 
c 2N 2N 


is symmetric aboutn = M — 1, ie., 
Equation: 

2i+1 
2N 


N-1 an 
S- Re (a;c;) cos (« — (M-14 n)) Im (a;c;) sin (« (M—1+n) 
7=0 


2i+1 2i+1 
= Re (a,c; M-1- —I ici) Si M-1- 

2 e (a;c;) cos (« WT ( n)) m (a,;c;) sin (« aN ( n)) 
forn = 0,---, M — 1. Using trigonometric identities, it can be shown that the condition above is 
equivalent to 
Equation: 

; 2i+1 \[p Ce 2+] Ong Dien taes 2+] Oy 1) 
= sin { 7 n e(a;c;) sin | 7 + Im (a;c;) cos {| 7 — : 
= 2N 2N 2N 

which is satisfied when 
Equation: 

Im(a;c;) sin (724+. (M — 1)) ; ( i+1 (M 1) 

= —7 — : 

Re(aici) cos (7244 (M — 1)) 2N 
Restricting |a; |=|c;|= 1, the previous equation requires that 
Equation: 

ac. = ei (M-1) |, 


It can be easily verified that the following {a;} and {c;} satisfy conditions [link], [link], and [link]: 
Equation: 
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Plugging these into the expression for H; (z) we find that 
Equation: 


Hi) = aiH (ce 42) + a, (eit 2) 
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n=0 
7 ae 241 M+N-1 . i+] M+N-1 
= (e728 (n 2 | IT ON (n 2 V inz n 
n=0 
M-1 
= 2 cos (1 a4 (n M+N-t ))Rn go 
n=0 


impulse response of H;(z) 


Repeating this procedure for K; (z) yields 
Equation: 


M-1 
Kitz) = 2 cos (w2#tt (n — 41 ))h, | 2. 
cS 


impulse response of K;(z) 


At this point we make a few comments on the design of the lowpass prototype H(z). The perfect H(z) 
would be an ideal linear-phase lowpass filter with cutoff at w = 2/2N, as illustrated in [link]. Such a filter 
would perfectly separate the subbands as well as yield flat composite magnitude response, as per [link]. 
Unfortunately, however, this perfect filter is not realizable with a finite number of filter coefficients. So, 
what we really want is a finite-length FIR filter having good frequency selectivity, nearly-flat composite 
response, and linear phase. The length-512 prototype filter specified in the MPEG standards is such a filter, 
as evidenced by the responses in [link]. Unfortunately, the standards do not describe how this filter was 
designed, and a thorough discussion of multirate filter design is outside the scope of this course. For more 
on prototype filter design, we point the interested reader to page 358 of Vaidyanathan or Crochiere & 
Rabiner. 
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Ideal (dashed) and typical (solid) 
prototype-filter magnitude responses for 
the cosine-modulated filterbank. Note 
bandwidth relative to [link]. 


prototype filter 


Magnitude response of |H(w)| of MPEG prototype 
filter and the resulting composite response |Q(w) 
where NV = 32 and M = 16N = 512. 
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To conclude, [link] (fourth equation) and [link] give impulse response expressions for a set of real-valued 
filters that comprise a near-perfectly reconstructing filterbank (under suitable selection of {h,;}). This is 
commonly referred to[ footnote] as a “cosine-modulated filterbank” because all filters are based on cosine 
modulations of a real-valued linear-phase lowpass prototype H(z). The near-perfect reconstruction 
property follows from the frequency-domain cancellation of adjacent-spectrum aliasing and the lack of 
phase distortion. 

It should be noted that our derivation of the cosine modulated filterbank is similar to that in Rothweiler 
ICASSP 83 except for the treatments of phase distortion. See Chapter 8 of Vaidyanathan for a more 
comprehensive view of cosine-modulated filterbanks. 

The MPEG standards refer to this filterbank as a “polyphase quadrature” filterbank (PQF), the name given 
to the technique by an early technical paper: Rothweiler ICASSP 83 

Polyphase Implementations: Recall the uniformly modulated filterbank in Figure 4 from "Uniformly- 
Modulated Filterbanks", whose combined modulator-filter coefficients can be constructed using products 
of the terms h,, and e/*”, Figure 5 from "Uniformly-Modulated Filterbanks" shows a computationally- 
efficient polyphase/DFT implementation of the analysis filter which requires only M multiplies and one N- 
dimensional DFT computation for calculation of N subband outputs. We might wonder: Is there a similar 
polyphase/fast-transform implementation of the cosine-modulated filterbank derived in this section? From 
[link] (fourth equation), we see that the impulse responses of { H; (z)} are products of the terms h,, and 
cos (1 7 (n M =1 )) for n = 0,---,M — 1. Note that the inverse-DCT matrix C,' can be specified 
via components with form similar to the cosine term in [link] (fourth equation): 


Equation: 
/ 2 a+1 
[Ci], = — Qn, COS ea ; in=0---N-1. 
le N 2N 


for a = 1/V2, On = 1. 


Thus it may not be surprising that there exist polyphase/DCT implementations of the cosine-modulated 
filterbank. Indeed, one such implementation is specified in the MPEG-2 audio compression standard (see 
ISO/IEC 13818-3). This particular implementation is the focus of the next section. 


MPEG Filterbank Implementation 


Since MPEG audio compression standards are so well-known and widespread, a detailed look at the MPEG 
filterbank implementation is warranted. The cosine-modulated, or polyphase-quadrature filterbank 
described in the previous section is used in MPEG Layers 1-3. (The MPEG hierarchy will be described in a 
later chapter.) This section discusses the specific implementation suggested by the MPEG-2 standard (see 
ISO/IEC 13818-3). 

The MPEG standard specifies 512 prototype filter coefficients, the first of which is zero. To adapt the 
MPEG filter to our cosine-modulated-filterbank framework, we append a zero-valued 513" coefficient so 
that the resulting MPEG prototype filter becomes symmetric and hence linear phase. Since the standard 
specifies N = 32 frequency bands, we have 

Equation: 


M = 513 = 16N +1. 


Plugging this value of M into the filter expressions [link] (fourth equation) and [link], the 27-periodicity of 
the cosine implies that they may be rewritten as follows. 


Equation: 
16N—1 ; 
H;(z) = S- 2 cos (n(n — T))Rn Zz. 
=) “—_——_-—_— 
se impulse response of H,(z) 
16N-—1 : 
K(2) = S- 2 cos ( oa (n+ T))Rn Zo. 
n=0 


impulse response of K;(z) 


Encoding: Here we derive the encoder filterbank implementation suggested in the MPEG-2 standard (see 
ISO/IEC 13818-3). Using x; (n) to denote the output of the i*” analysis filter, we have 
Equation: 


16N-1 


a(n) = > [2 cos (244 (k — £))hy| a(n — k). 
k=0 


The relationship between «; (n) and its downsampled version s; (™m) is given by 
Equation: 


8; (m) = a; (mN), 


so that the downsampled analysis output s; (m) can be written as 


Equation: 
16N—1 
si(m) = > [2 cos (12 (n — £))h,| 2 (mN — n). 
n=0 


Using the substitution n = kN + @for0 < @< N-1, 
Equation: 


s(m) = 2 cos (7 (kN + £— £)) Aunye 2((m —k)N — 2) 
k=0 (=0 “~~ 
repeats every 4 increments of k 
sign changes every 2 increments of k 


15 N-1 

2i+1 N k/2 

- S > cos (122 ((k)N + €— %)) 2(-1)4 "haw ye a((m — k)N - 2) 
k=0 £=0 ae 
repeats every 2 increments of k analysis window 
[link] illustrates this process. 
k=0 k=1 k=2 k=3 k=4 k=15 
0 N 2N aN an 5N 15N 16N 
Lx 


COSINE MATRIX TRANSFORMATION: 
cos (254* (J — 3)) 


8N—1i(m) 


MPEG encoder filterbank implementation suggested in ISO/IEC 13818-3. 


e Decoding: Here we derive the dencoder filterbank implementation suggested in the MPEG-2 standard (see 
ISO/IEC 13818-3). Using y; (n) to denote the output of the i’” upsampler, 
Equation: 


16N-1 


ui(n) = oF [2 cos (7 ae (k+ ))helyi(n — &). 
=0 


The input to the upsampler s; (m) is related to the output y; (n) by 
Equation: 


s;(n/N) when n/N €Z 
palm) = nin) when 
0 else, 


so that 
Equation: 


2i+1 


Dok: en} [2 cos (455 


¥))ulsi( 4): 


Lets writen = mN + £for0 < £< N—landk=pN +4q for 0 < q < N-—1. Then due to the 


restricted ranges of € and q, 
Equation: 


n—k i248 
= WM: — ——— 
N Pp 


EZ => 
N 


Using these substitutions in the previous equation for wu; (7), 
Equation: 


= 


))hpw+e si (m — p). 


= Pa Heal » cos (w444* (pN+ 4+ 4 
Summing u; (mN + £) over i to create u(mN + £), 
Equation: 


N-1 15 
= 2 co 
i=0 p=0 


u(mN + £) nit (pN + £4 


repeats every 4 increments of p 
sign changes every 2 increments of p 


ya 1 ) 20 hpn+e Ses (1 2 


2i+1 
2N 


synthesis ia ne (x { (t+ a) 
(¢+N+4)) p odd 


cos (x 


If we define 


Yo + €+ 
ama Ess, 


+))) hpw+e 8: (m — p) 


+))) si (m — p) 


p even 


(-1) "howe Vern (m — p). 


Equation: 
u(m).= ay cos (w#i4t (7+ 4))s;(m) for 0<j7<2N—1, 
(note the range of 7!) then we can rewrite 
Equation: 
u(mN +2) = SS (-1) 7 howe us(m—p) + > 
p=0,2,---,14 p=1,3,---,15 


[link] illustrates the construction of u(mN + 


Equation: 


v(m) 


= (vo (m) 


£) using the notation 


VIN-1 (m)). 


so(m) —7i=0 


ai(m) —4i=1 


COSINE MATRIX TRANSFORMATION: 
2i+1 (, N 
cos (x 55" (J + )) 


en-i(m) —|i=N-1 


MPEG decoder filterbank implementation suggested in ISO/IEC 13818-3. 


e DCT Implementation of Cosine Matrixing: As seen in [link] and [link], the filterbank implementations 
suggested by the MPEG standard require a cosine matrix operation that, if implemented using 
straightforward arithmetic, requires 32 x 64 = 2048 multiply/adds at both the encoder and decoder. Note, 
however, that the cosine transformations in [link] and [link] do bear a great deal of similarity to the DCT: 
Equation: 


af & an hy an cos (224k); k=0---N-1, 
for ag = 1/V2, aro = 1, 


C= 4/ = eg ak Ye Cos (12241 k); n=0---N-1, 


which we know has a fast algorithm: Lee's 32 x 32 fast-DCT, for example, requires only 80 
multiplications and 209 additions (see B.G.Lee TASSP Dec 84). So how do we implement the matrix 
operation using the fast-DCT? A technique has been described clearly in Konstantinides SPL 1994, the 
results of which are summarized below. At the encoder, the matrix operation can be written 

Equation: 


Uk 


s;(m) = a cos (nw (7 - *))wj(m) for i=0,---,N-1, 


where {wo (m), ---, w2n-1 (m) is created from {x(m),---,xz(m — 16N + 1)} by windowing, shifting, 
and adding. (See [link].) We can write 
Equation: 


si(m) = <5 cos (x a j)w; (m); i=0,---,N-1, 


where, for N = 32, {@; (m)} is the following manipulation of {w, (m)}: 
Equation: 
wis (m) j=0 
Wj(m) := ¢ wig+;(m) + wig_j(m) j =1,2,---,16 


W16+4+j (m) — Ws0-j (m) j = 1%, 18, es sole 


Compare [link] to the inverse DCT in [link] (lower equation). At the decoder, the matrix operation can be 
written 
Equation: 


vj(m) = SiN 5) cos (wt (f+ %))si(m) for j=0,---,2N—-1, 


where {v9 (m), +--+, v2w—1 (m)} are windowed, shifted, and added to compute {u(m) }. (See [link].) It is 
shown in Konstantinides SPL 1994 that, for N = 32, {v,; (m)} can be calculated by first computing 
{vj (m)}: 


Equation: 


v;(m) = pel cos (124 7) 8; (m); 7=0,:--,N-1 


and rearranging the outputs according to 
Equation: 


Vj416 (m) i) = 0,1,---,15, 

0 7= 16; 

—Vag_j (m) j= 17,18,---,47, 
~%;_43(m) 7 = 48,49, ---,63. 


uv, (2) = 


Compare [link] to the DCT in [link] (upper equation). 


MP3 and AAC: MDCT Processing 

In MP3 and AAC coders, the frequency resolution of the polyphase 
quadrature filterbank is increased using a cascaded MDCT stage. We 
describe that here, and give the details of the MDCT stage. 


MDCT Filterbanks 


e Hybrid Filter Banks: In more advanced audio coders such as MPEG 
“Layer-3” or MPEG “Advanced Audio Coding” (the details of which 
will be discussed later), the 32-band polyphase quadrature filterbank 
(PQF) is thought to not give adequate frequency resolution, and so an 
additional stage of frequency division is cascaded onto the output of 
the PQF. This additional frequency division is accomplished using the 
so-called “Modified DCT” (MDCT) filterbank. (See [link].) 


\ Q-bands 
\ Q-bands 
N-band 
Polyphase Quadrature 
Filterbank 
\ Q-bands 


Hybrid filterbank scheme used in MPEG 
Layer-3 (where V = 32 and Q switches 
bewteen 6 and 18) and MPEG AAC (where 
N = 4 and Q switches between 128 and 1024). 


e Lapped Transforms: The MDCT is a so-called “lapped transform.” At 
the encoder, blocks of length 2@ which overlap by Q samples are 
windowed and transformed, generating Q subband samples each. At 


the decoder, the Q subband samples are inverse-transformed and 
windowed. The windowed output samples are overlapped with and 
added to the previous Q windowed outputs to form the output stream. 
[link] gives an intuitive view of the coding/decoding operation, while 
[link] and [link] specify the specific coder/decoder implementations 
used in the MPEG schemes. 


overlapping input windows windowed and overlapped outputs 


COSINE MATRIX TRANSFORMATION: 
2i+1 (2 
cos (735 (j+ n)) 


MDCT filterbank: encoder implementation. 


COSINE MATRIX TRANSFORMATION: 
2i+1 (5 
cos (= 2 + nte)) 


MDCT filterbank: decoder implementation. 


e Perfect Reconstruction: Based on the cancellation of time-domain 
aliasing components, Princen, Johnson, & Bradley show (in ICASSP 
87 and TASSP 86 papers) that the MDCT acheives perfect- 
reconstruction when window {w,} is chosen so that overlapped 
Squared copies sum to one, i.e., 

Equation: 


‘ee Weg + wr, for O0<n<Q-l1. 


The “sine” window 
Equation: 


Wy = sin (35") for O0O<n<2Q-1 


is one example of a window satisfying this requirement, and it turns 
out to be the one used in MPEG Layer-3. 


e Frequency Resolution: With a window length that is only twice the 
number of transform outputs, we cannot expect very good frequency 
selectivity. But, it turns out that this is not a problem. In MPEG Layer- 
3, sine-window MDCTs appear at the outputs of a 32-band PQF where 
frequency selectivity is not a critical issue due to the limited frequency 
resolution of the human ear. In MPEG AAC, a 4-band PQF in 
conjunction with an optimized MDCT window function gives 
frequency selectivity just above that which current psychoacoustic 
models deem necessary (see M. Bosi et al., "ISO/IEC MPEG-2 
Advanced Audio Coding" in JAES Oct 1997). 

¢ Window Switching: Larger values of Q lead to increased frequency 
resolution but decreased time resolution. Time resolution is linked to 
the following: error due to the quantization of one MDCT output is 
spread out over + 2QN time-domain output samples. For signals of a 
transient nature, choosing QN too high leads to audible “pre-echoes.” 
For less transient signals, on the other hand, the same value of QV 
might not be perceptible (and the increased frequency resolution might 
be very beneficial). Hence, most advanced coding schemes have a 
provision to switch between different time/frequency resolutions 
depending on local signal behavior. In MPEG Layer-3, for example, Q 
switches between 6 and 18. This is accomplished using a sine window 
of length 36, a sine window of length 12, and intermediate windows 
which are used to switch between the long and short windows while 
retaining the perfect reconstruction property. [link] shows an example 
window sequence. 


Example MDCT window sequence for MPEG Layer- 
or 


